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Speech And Hearing Research 


Gordon E. Peterson 


Communication has long been recog- 
nized as one of the most fundamental 
components of human behavior. The 
importance of a technical understand- 
ing of the basic processes of communi- 
cation, however, has only recently 
been generally recognized. Today 
many of the major universities main- 
tain research laboratories for the study 
of speech and hearing processes. These 
laboratories vary in nature and orien- 
tation from speech and psychology to 
medicine and electrical engineering. 

Laboratory research is no stronger, 
however, than the theoretical formu- 
lations and hypotheses which underlie 
it. As in any developing experimental 
field, at times there is a temptation to 
explore without plan and to report 
without critical analysis and evalua- 
tion. Fortunately, increased attention 
is now being given to the theory of 
speech communication. Direct experi- 
ence with the problems and confusions 
of the time is a strong and perhaps 
necessary stimulus to theoretical ex- 
ploration. 





Gordon E. Peterson (Ph.D., Louisiana 
State University, 1939) is Professor of 
Speech and Director of the Speech Research 
Laboratory at the University of Michigan. 
This article is based on a paper presented 
at the Conference on Speech and Hearing 
Research sponsored by the National Instit- 
ute of Neurological Diseases and Blindness 
of the National Institutes of Health on No- 
vember 23, 1957, following the convention 
of the American Speech and Hearing As- 
sociation in Cincinnati. 
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Such areas as symbolic logic, in- 
formation theory, and the theory of 
signal detection provide much of the 
essential mathematical foundation for 
speech communication theory. But 
these are basic tools, not the complete 
theory. The study of speech as a be- 
havioral process involves, in addition 
to mathematics, such areas as acoustics, 
physiology, learning theory and lin- 
guistics. Thus, it is strange that some 
have looked to information theory 
for a basic solution or description of 
the speech communication processes, 
and others have been inspired to treat 
all communication systems as binary 
in character. All systems, communica- 
tion or otherwise, are not structured 
to suit the convenience of the mathe- 
matician’s log, N. The components of 
structure suit the structure, not some 
arbitrary numerical simplicity. By 
evaluating various communication op- 
erations in terms of equivalent binary 
selections, a common and powerful 
base is provided for their comparison; 
but there is no reason to assume that 
because such an evaluation of a proc- 
ess is possible the process is funda- 
mentally binary in character. 


A Speech Communications 
Diagram 


An attempt to represent the basic 
physiological systems involved in 
speech communication is shown in 
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Ficure 1. A schematic representation of a simple communications link, with the speaker 
in the upper left and the listener in the upper right: S—generalized sensory system, B— 
brain, E—ear, M—Motor mechanism of speech. The three circles in the lower portion 
of the diagram represent an experimenter: S—generalized sensory system, B—brain, M— 
generalized motor system. The dashed circles and arrows represent measuring instruments. 


the simple schematic of Figure 1. In 
the upper portion of this diagram is 
represented a basic speech communi- 
cation link composed of two individ- 
uals. This figure resembles the func- 
tional diagrams drawn by communica- 
tion engineers with input, output and 
feedback or servo controlling paths. 
The speaker is indicated by the four 
circles in the upper left and the lis- 
tener is indicated by the four circles 
in the upper right. S indicates a gen- 
eralized sensory input for supplying 
information to the system. B repre- 
sents the brain, or more generally 
the nervous system, of the individual. 


E represents the auditory sensory 
system and M represents the motor 
mechanism of speech. The solid line 
froin M to E and the dashed line from 
M to B represent the two basic meth- 
ods of referring information about 
the output back to the neural con- 
trolling components. The time func- 
tion of the signal produced at M is 
now well known to be highly de- 
pendent upon the time constants of 
the mechanical system and the neural 
oaths, including the feedback linkages; 
also, in the development of speech its 
spectral structure has probably been 
influenced by both the mechanical and 
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acoustical properties of M and by 
the frequency range and sensitivity 
of E. 

While the system shown here over- 
laps with and is dependent upon many 
of the basic systems of human physi- 
ology, there is considerable justifica- 
tion for including the speech com- 
munication system as one of the 
fundamental systems of human physi- 
ology. The highly developed ability 
to communicate is one of the basic 
distinctions between man and other 
animals. Regardless of whether certain 
physiological correlates of this ability 
can be specified independently, it is 
clear that a very high degree of or- 
ganization of human function is re- 
quired for speech communication. 


The Experimental Study of 
Speech and Hearing 


While the processes of communica- 
tion are fundamental, they are not to 
be defined within any single tradi- 
tional academic discipline, for the 
knowledge essential to an understand- 
ing of the communication processes 
extends from mathematics to linguis- 
tics and from human physiology to 
speech. 

In order to study these processes 
the experimenter should observe 
them independently. He cannot serve 
his function effectively while he is in 
addition serving it as the listener 
in the diagram of Figure 1. He must 
employ those experimental and re- 
search techniques which make it possi- 
ble for him to obtain valid and reliable 
information. The three circles shown 
toward the bottom of Figure 1 repre- 
sent such an observer; the schematic 
meters represent instrumentation 
which he might employ. In the case 
of this third observer, M represents 
generalized motor processes which 


may be involved in conducting re- 
search. The experimenter may in some 
temporary manner alter the communi- 
cation mechanisms he studies, (M-M, 
M-E). In research in the physiology 
of hearing, animal experiments have 
provided much valuable information. 
Organic speech and hearing disorders 
also provide special modifications of 
the human system which may be 
studied. 


In order to simplify the diagram, 
connections into the nervous systems 
of the speakers and the listener are 
not indicated. There is little question, 
however, but that a primary source 
of information about the processes of 
communication lies within the central 
nervous system of the speaker. It is 
fortunate that research on the neural 
physiology of speech and hearing is 
now progressing at an accelerated rate. 
At present such neural research is 
indeed far from defining the corre- 
lates of the linguistic structures and 
the semantic referents of speech and 
it is probably not for one writing in 
this time to predict whether such cor- 
relates will ever be specified in a 
useful form. 


The Symbolic Components of 
Speech 


It is common knowledge that the 
motor production of speech primarily 
involves an organization of portions 
of the tracts of digestion and respira- 
tion. It is of significance that the oral 
opening into the pharynx provides an 
emergency air inlet when the normal 
air processing mechanism becomes oc- 
cluded. Since the two tracts are 
crossed, the velar and laryngeal valves 
are essential to the protection of the 
respiratory system. The organs of 
mastication and deglutition combined 
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with these valves result in an ex- 
tremely flexible tone and noise gener- 
ating and controlling system. 


The symbolic components of speech 
are restricted to the acoustical proper- 
ties of these tracts. It is well known 
that most languages involve a broad 
sample of various types of sound 
formation; but it is also well known 
that languages differ greatly in pho- 
netic detail and in their use of signifi- 
cant contrasts among speech sounds. 
For example, in addition to the use 
of intonation, laryngeal tone is used 
in some languages as part of the iden- 
tity of each word, whereas in other 
languages the use of laryngeal tone is 
restricted primarily to intonation pat- 
terns. English generally employs three 
pairs of contrasting voiceless-voiced 
plosive consonants; some languages 
employ only two and others employ 
four or more. To illustrate further, the 
constrictions and closures for the con- 
sonants of English are almost ex- 
clusively formed in the oral cavity, 
while Arabic employs a strong com- 
plement of pharyngeal consonant for- 
mations. 

It does not yet appear to be ade- 
quately recognized that speech in- 
volves a varying number of simul- 
taneous codes. These involve such 
parameters as the energy of speech 


(primarily controlled by sub-laryngeal 


pressures), laryngeal tone, laryngeal 
quality and supra-laryngeal constants 
(such as pharyngealization, velariza- 
tion, palatalization, labialization and 
nasalization). The articulatory forma- 
tions, in the sense of traditional de- 
scriptive phonetics, are thus only one 
aspect of a complicated system of 
codes found in various forms in every 
language. 

It is these parameters and articula- 
tory formations which are represented 
in the acoustical signal. While the 


acoustical wave of speech is a simple 
uni-dimensional pressure function of 
time, the analysis of this wave into its 
essential acoustical parameters is a 
complex and difficult process. These 
acoustical parameters do not in general 
have a simple and direct correspond- 
ence to the physiological parameters. 
In addition, disturbing noise is nor- 
mally added, so that the acoustical 
signal is a somewhat incomplete and 
degraded representation of the physi- 
ological activity. With special instru- 
mentation, in fact, it may be easier to 
observe certain aspects of the physio- 
logical activity directly, rather than 
through analysis of the acoustical 
wave. 

We may consider the purpose of 
the activity of the speech musculature 
to be the production of an acoustical 
wave. This muscular activity, in turn, 
is controlled by an integrated neural 
activity. Because of the factors of 
noise and distortion, a precise descrip- 
tion of the muscular activity cannot 
be derived from the acoustical wave 
and a precise description of the neural 
innervations cannot be derived from a 
knowledge of the muscular move- 
ments. It is the actions of these sys- 
tems which are the manifestations of 
speech. It is their processes which may 
be studied and analyzed and it is from 
them that the structural properties of 
the speech code may be derived. 


The Speech Information Source 


Controlling the lower level neural 
activity involved in speech production 
is the activity of the motor cortex. 
Also, in order to produce speech it is 
obvious that a higher level symbolic 
formulation must underlie the actual 
expression. The organization of the 
speech movements into units which 





may bear meaning is a cortical func- 
tion involving learning and memory. 
At this level we become involved in 
the association activities of the brain 
and the semantic aspects of the speech 
processes. 

When compared with other animals, 
man has a tremendous elaboration of 
the cerebrum. Other animal forms 
have motor mechanisms capable of 
producing a highly varied assortment 
of sounds, but these animals do not 
have speech which approaches the 
highly developed system employed by 
man. Since it is primarily the extensive 
development of the cerebral hemi- 
spheres which distinguishes the central 
nervous system of man from that of 
other animals and since only man has 
a highly developed system of speech, 
we may conclude that the primary in- 
formation source of speech lies within 
the cerebrum. In such an interpreta- 
tion it must be noted that, in general, 
neural systems have developed ac- 
cording to the properties of sensory 
and motor systems and, further, it 
must be noted that the motor systems 
of the body provide the outlets or 
representations of neural activity. 


Orthography versus Exact Speech 
Representation 


In general, speech is considered to 
be the basic form of language. There 
is much present activity, in fact, in 
the derivation of orthographic systems 
for previously unwritten languages. 
Such writing is normally based upon 
the spoken language, but must have a 
broader application, of course, than to 
specific utterances and specific dia- 
lects. The writing thus becomes an 
abstraction, which may apply exactly 
to few or to no actual dialects or 
utterances. Since such writing is 
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created by the human and is also 
interpreted by him, it may be entirely 
practical and efficient for communca- 
tion purposes. The human is quite 
capable of converting any number of 
written codes to his normal speech 
system, even though the codes have 
only a very indirect correspondence 
to his speech system. 

A scientific symbolic representation 
of actual speech, on the other hand, 
requires a phonetic (instrumental and 
descriptive) representation of specific 
utterances. Such specific descriptive 
notations may also be generalized into 
phonemes, and on the basis of distribu- 
tion or other criteria organized into 
morphemes, etc. These representations 
will apply to specific utterances and 
dialects, however, and their applica- 
tion to a more general notation for a 
language may be limited. Specialized 
representations are doubtless more 
practical for automatic speech recog- 
nition where memory and learning 
ability, at least for the present, are 
much more limited for machines than 
for the human. Practically, such limi- 
tations would restrict the range of 
operation of such machines. 

At the present time there is much 
confusion over whether language 
structure can be properly defined 
without close observance of phonetic 
detail. This confusion appears to be 
primarily a matter of definition of 
terms. If a practical system of orthog- 
raphy which can be written and read 
by humans is assumed to define ade- 
quately a ‘language structure,’ then 
the system need not follow a con- 
sistent plan of organizing phonetic 
data. If ‘language structure’ is pre- 
sumed to be a consistent representa- 
tion of actual speech patterns, how- 
ever, then the foundation of the de- 
scriptive system must lie in phonetic 
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accuracy. It is this latter approach 
which is pursued by many and it is 
this latter approach which appears to 
be basic to a number of important 
applications in the field of speech 
analysis. The complications of the 
field appear to be appreciated by rela- 
tively few. The phonetic variations 
which occur in speech were not 
clearly recognized until the develop- 
ment of the sound spectrograph and 
even yet the extent of these variations 
has received little experimental atten- 
tion. 


The Basic Processes of Speech 
Production 


The process of speech production 
has bewildered its students inces- 
santly, for the vocal mechanism in- 
volves the complex functioning of 
many organs and structures of the 
human body. Certain of the simpler 
aspects of the process, however, are 
relatively clearly understood. For ex- 
ample, the distinction between phona- 
tion and whispering has been clearly 
demonstrated in high speed motion 
pictures of the vocal cords. Further- 
more, descriptions of the articulatory 
positions of the vowels and consonants 
of many languages have appeared in 
numerous texts. Also, successful tech- 
nical descriptions of the acoustical 
properties of vowels have recently 
been developed. But in many of its 
more complex aspects the mechanism 
of speech production is very poorly 
understood. 

One of the most basic and most 
challenging questions is whether respir- 
atory impulses provide a basis for 
phonetic syllables. Since the articula- 
tions of speech generate back pressures 
in the thorax, simple sub-laryngeal 
pressure measurements are not ade- 
quate to answer this question. Bas- 


ically, it is mecessary to know 
whether during exhalation for speech 
the respiratory musculature is inner- 
vated at rates which correspond to the 
occurrence of certain types of articu- 
latory sequences. Acoustical, X-ray, 
motion picture, electro-myographic 
and other techniques might be em- 
ployed to investigate this question, but 
the research is yet to be done. As a 
hypothesis, this author would propose 
that separate respiratory driving forces 
can be identified for ‘syllables’ only 
at very slow speech rates. If this hy- 
pothesis is correct, then the definition 
of a syllable at the phonetic level must 
be expressed primarily in terms of 
articulatory and other laryngeal and 
supra-laryngeal parameter sequences. 
On the other hand, if a respiratory 
basis for the syllable at conversational 
speech rates 1s demonstrated experi- 
mentally, then current theories of 
speech production will require modi- 
fication. 


Laryngeal and supra-laryngeal vocal 
qualities form a second major area 
which offers much opportunity for 
research. Early efforts suffered con- 
siderably frorn an attempt to deal with 
vocal qualities as perceptual abstrac- 
tions. According to the previous dis- 
cussion of Figure 1, in the description 
of speech properties a primary em- 
phasis should be placed upon the 
generating mechanism. In accord with 
this view is the current growing em- 
phasis upon the relationships among 
the physiological, the acoustical and 
the perceptual aspects of vocal quality. 

Laryngeal quality, like many other 
parameters of speech, may have se- 
mantic significance. In fact, in some 
languages, such qualities as breathiness 
provide minimal distinctions between 
words. A little explored and highly 
rewarding field of research lies in the 
description of various laryngeal quali- 
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ties according to organic structure 
(especially pathological), actions of 
the larynx and resulting acoustical 
parameters. 


Almost equally important is the area 
of supra-laryngeal qualities. These 
qualities result from characteristic 
tongue, jaw, lip and velar positions 
(and in some cases associated struc- 
tural anomalies) which pervade the 
speech pattern. Such factors serve as 
a constant influence or bias upon the 
total production. In the acoustical 
patterns we would expect these factors 
to influence the dynamics of the 
process and to have systematic influ- 
ences upon the spectrum, such as shift- 
ing the frequency positions of certain 
formants or adding certain secondary 
formants. 


The physiological articulations of 
the consonants have long been de- 
scribed in terms of organic position 
and manner of production. In general, 
however, these descriptions are at the 
level of casual observation and per- 
sonal opinion. The concept of speech 
production as a combined mechanical 
and acoustical process, in which there 
is a balance between articulatory 
(muscle) tension and driving breath 
pressure, needs much further emphasis. 
The faulty articulation of the cleft 
palate speaker provides an excellent 
example of failure to maintain a 
proper balance of tension and pharyn- 
geal and oral pressure. It is probable 
that within the next few years there 
will be a major development of physi- 
cal theory and description of con- 
sonant sound formation. There is great 
opportunity for the application of 
modern, calibrated electro-acoustical 
instrumentation for observations on 
consonant sound formation. Devices 
are available which can be applied to 
the study of many aspects of the ar- 
ticulatory processes, but at present this 


area of research is essentially un- 
developed. 

Closely related to the problems of 
the above area are those of studying 
nasalization. There has been consider- 
able advance in the theory of the in- 
fluence of the velar opening and the 
nasal cavities upon vocal resonance. A 
study of palatal action and acoustic 
correlates in normal speakers, how- 
ever, is yet to be achieved. The probe- 
tube for acoustical measurements and 
the modified naso-pharyngoscope for 
direct viewing or photographing of 
velo-pharyngeal valve action are essen- 
tial tools in such research. 


Applications of Speech and 
Hearing Research 


The above discussion is primarily 
in terms of problems in the basic 
understanding of the speech and hear- 
ing processes; little consideration has 
been given to application. Neverthe- 
less, it is often the recognition of new 
opportunities and developments which 
underlies and directs the goals of basic 
research. The fault in so-called ‘ap- 
plied’ research is not with the purpose, 
but with the superficial means that are 
often adopted. ‘Research’ of the trial- 
and-error type rarely solves any prob- 
lems or results in a basic advance 
toward an applied goal. A basic and 
systematic approach to research on 
applied problems, however, is becom- 
ing increasingly common and its im- 
portance, particularly in industrial 
laboratories, is now well recognized. 

Not many years have passed since 
the applications of speech and hearing 
research were generally considered to 
be trivial and perhaps non-existent. 
With an increasing emphasis upon the 
technical aspects of communication, 
however, it is difficult to predict the 
extent of the applications. 








10 JOURNAL OF SPEECH AND HEARING RESEARCH 


The importance of a basic under- 
standing of the processes of speech 
and hearing is now well recognized in 
the fields of speech correction and 
audiology. A knowledge of normal 
processes, however, has often been 
aided by work with the abnormal. But 
the speech and hearing pathologist 
must not be deluded into complacency 
by this observation, for an understand- 
ing of the normal is fundamental to 
understanding the abnormal. In fact, 
the development of basic theory of 
function and of new techniques of 
instrumental observation often apply 
to normal and abnormal alike. It is 
entirely possible to conduct basic and 
substantial clinical research. 

Basic knowledge ot the speech and 
hearing processes is of importance, of 
course, in the areas of general speech. 
A different and important application 
of this knowledge is in the field of 
communications technology. This field 
is expanding rapidly and involves 
many basic operations with speech. 
Primary among these are the auto- 
matic identification of speech signals, 
the instrumental synthesis of speech 
and the compression of speech with a 
retention of its intelligibility for 
transmission over channels of low 
information capacity. These opera- 
tions require much further knowledge 
of both the mechanical and the sym- 
bolic aspects of speech. Objectives in 
communications technology will un- 
doubtedly continue to serve as a major 
force in guiding phonetics research. 

As indicated above, an orthography 
which is developed for a language 
need have no consistent correspond- 
ence to the speech. For example, re- 
dundancy in English generally makes 
it possible to deal easily with words 
which are pronounced alike but 
spelled differently and with words 
which are spelled alike but pro- 


nounced differently. If the field of 
linguistics is considered to be pri- 
marily concerned with the technical 
description of the speech system or 
structure, it extends well beyond the 
objective of reducing languages to a 
practical orthography. Thus the scien- 
tific treatment of speech structure is 
not an application of experimental 
phonetics, but involves experimental 
phonetics as an essential basis to a 
field of knowledge. 


The Nature of Research 


Because the study of the speech 
processes is essentially interdisciplinary 
in character, a meaningful and basic 
organization of knowledge about these 
processes is not easily achieved. It is 
the modification and development of 
this knowledge which is the primary 
objective of speech and hearing re- 
search. Research is not a process of 
interesting exploration, independent 
of the fund of information already 
available. Whether at this time or at 
any particular time the fund is con- 
sidered great or small depends upon 
the perspective with which one views 
the unsolved problems and the possi- 
bilities of the future. Fundamentally, 
research is a process which corrects 
and modifies, which expands and adds 
to a body of knowledge about a field. 
In order to do research, then, one 
must have information, and in order 
to do significant and enduring re- 
search he must have a broad knowl- 
edge of the information in both the 
immediate field involved and in related 
fields. 

Thus, the selection of a specific re- 
search problem must grow out of a 
knowledge of a field and its problems. 
The plan of an experiment is not 
something which can be rigidly set 
and followed blindly as the data are 
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procured. The analysis of data is not 
a routine statistical operation, but one 
requiring a constant critical evaluation 
and interpretation. In the written re- 
port there is no substitute for careful 
and concise statement and repeated 
revision. Often it is not until the 
actual writing is attempted that faults 
in basic objective, experimental design 
and gaps in the data become obvious. 
These discoveries should send one 
back to the laboratory, not motivate 
an attempt to conceal the weakness of 
the study in vague and ambiguous 
statements. 

Important research, then, is the 
product of an inquiring mind and a 
determination to learn more about a 
field. There may be temporary satis- 
faction in the successful completion of 
a specific research project, but no 
sincere and determined scholar can 
know ultimate satisfaction, for the 
quest for knowledge is unending and 
its opportunities are limitless. It is of 
significance that some workers main- 
tain a high level of research contribu- 
tion regardless of limitations in facili- 
ties and in spite of other responsibili- 
ties. It is somewhat alarming that so 
few of the more substantial positions 
in the field of speech and hearing 
presume a primary responsibility to 
research. If the field is to achieve and 
retain a proper stature among the 


academic disciplines, it is our responsi- 

bility to see that this situation is im- 
proved. Within these disciplines there 
is no substitute for progress in funda- 
mental research. 


Summary 


This paper has been concerned with 
the basic problems of research on 
human communication. A simple 
speech communication diagram is pre- 
sented in which the experimenter is 
represented as a third individual who 
analyzes the processes involved in 
speech production and speech percep- 
tion. It is suggested that the motor 
neural aspects of speech form the 
primary information source. The sym- 
bolic representation of speech in 
which there is an accurate and con- 
sistent correspondeace with a given 
dialect is contrasted with an orthog- 
raphy which is applied generally to 
a language. It is noted that in the 
study of speech production there are 
crucial unsolved problems concerning 
respiration, phonation, articulation and 
nasalization. It is emphasized that a 
basic approach to research on applied 
problems in the field of speech and 
hearing is most productive and that 
research is fundamentally an informa- 
tion seeking process. 








Effects Of Delayed Auditory 
Feedback Upon Articulation 


Grant Fairbanks 


Newman Guttman 


An investigation of the influences of 
delayed auditory feedback upon dif- 
ferent speech variables was reported 
in a previous article (1). Four vari- 
ables were considered and the influ- 
ences were found to be dissimilar. The 
results were interpreted as supporting 
the conclusion that disturbed articula- 
tion and increased duration are ‘direct 
effects’ of time delay; that greater 
sound pressure and higher fundamen- 
tal frequency are ‘indirect effects,’ 
evidencing ‘effort to maintain system 
control’ and ‘resist experimental inter- 
ference with the response’; and that 
articulatory disturbance is the ‘pri- 
mary effect (1).’ The nature of the 
disturbance of articulation is the sub- 
ject of the present article. It consti- 
tutes a second report of data from the 
experiment and presents the results of 
an attempt to make an orderly, pri- 
marily phonetic description of a 
speech display that, as observed ear- 
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lier, ‘is often so chaotic that conven- 
tional measurements do not describe 
it (1).’ Whereas the first report con- 
sidered only the number of articula- 
tory errors, without regard to type, 
and restricted the observations to a 
sentence drawn from a longer sample, 
the analysis reported here gave special 
attention to the various types of errors 
and extended the treatment to the 
complete sample. 


Procedure 


Since the experiment has been de- 
scribed elsewhere (1), only its general 
plan need be reviewed. The subjects 
were 16 young men, each of whom 
read the same prose passage seven 
times as follows: a free pre-experi- 
mental reading without amplification 
or earphones; five experimental read- 
ings with amplification via earphones 
and with time delays of 0, 0.1, 0.2, 0.4 
and 0.8 sec.; a post-experimental read- 
ing under the same conditions as the 
0-sec. experimental reading. The per- 
formances were recorded. A six- 
sentence experimental passage was 
used, of which the middle four sen- 
tences, totaling 55 words, were studied 


March .1958 
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in the present analysis. (The measure- 
ments reported earlier were confined 
to the third of the four sentences, a 
13-word sample.) 

The overall articulatory accuracy 
of each sample was estimated by 
counting the number of correct words. 
This measure was introduced and dis- 
cussed in the first report. “The pro- 
cedure was to listen to the recording 
of each individual word as many 
times as necessary and... tally it as 
correct or incorrect ... The standard 
was that of acceptability of articula- 
tion and pronunciation interpreted 
liberally, and each word was presumed 
correct unless clearly incorrect. 
[When material was inserted between 
words the practice was] to reduce 
the count of, correct words by one in 
each such instance, regardless of the 
amount of inserted material, except 
when both of the text words bound- 
ing the insertion were themselves in- 
correct (1).’ The ratio of this count 
to the total reading time furnished a 
measure of articulatory efficiency in 
correct words per second. This has 
been termed correct word rate and 
it also was discussed in the first report. 

Errors of articulation were studied 
by means of an independent procedure 
which was performed at a different 
time. The basis of the analysis was a 
reference transcription of the passage 
which showed the expected phonetic 
output in an undisordered reading. 
For the articulation of each element, 
this constituted a standard ‘which 
would be regarded as not having been 
achieved by an obtained articulation 
only when the latter was obviously 
outside a wide region of acceptability 
(1). This kind of standard implies 
a large number of admissable alterna- 
tives and a few illustrations will show 
the level of specification employed. 


In the words beyond the horizon, for 
instance, the standard was as follows: 


b 

Et 

J 

a, 9 

n 

d or omit 
fa) 

= | 
h or omit 
3, 9, 0, OU 
r 

aq, ar 

Z 

n, on 


Examples of other alternatives are 
[a] or [w] in white, [6] or [6] in 
with, omission of [d] in and its, ends 
and finds, omission of one [1] in 
people look. 

A phonetic transcription of each 
sample was made for comparison with 
the reference. The practice was to 
transcribe broadly by phrases as 
spoken, recording stress patterns, and 
the transcription served as a working 
data sheet for the error analysis. All 
phonetic deviations from the reference 
were located and classified according 
to major type of error, substitution, 
omission or addition, plus a miscel- 
laneous category explained below. As 
will appear, each instance was then 
sub-classified according to various as- 
pects thought to be of interest. All 
articulatory elements represented in 
the reference transcription were po- 
tential points of error, but when the 
same type of deviation occurred in 
two or more consecutive elements 
(e.g.. an omitted poly-phonemic 
word), one instance of error was 
noted and its length accounted for in 
sub-classification. Different types oc- 
curring in succession, however, were 
counted as separate instances. Addi- 
tions to the expected output were 
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TaBLE 1. Group means for measures of correct words, duration and correct word rate. Lower 


section converted from previous report (1). 








Time Delay (sec) 
0 ro | 2 





‘ig ot 4 8 

55-Word Passage 

Correct Words (%) 94 94 95 81 69 73 82 

Duration (sec/55) .ol 31 .32 43 .50 47 Al 

Correct Word Rate (N/sec) 3.1 oul 3.0 1.9 1.5 1.8 2.1 
13-Word Sentence 

Correct Words (%) 85 82 82 67 58 62 67 

Duration (sec/55) 24 25 .25 .38 43 .40 82 

Correct Word Rate (N/sec) 3.5 3.4 3.3 1.8 1.5 17 2.2 








* Pre-experimental 
T Post-experimental 


numerous, as will be seen, and all 
change-points between elements were 
considered as possible loci. Each addi- 
tion was tallied once, regardless of its 
other characteristics. Thus, in the 
general count, error refers to an in- 
stance of error of a specified major 
type, ,consisting of a run of one or 
more elemental deviations of that 


type. 
Results 


General Articulatory Accuracy. 
The upper section of Table 1 presents 
group means for percentage of correct 
words, duration and correct word 
rate. Duration is expressed as total 
speaking time divided by number of 
text words, 55, for purposes of inter- 
comparison and the values may be 
interpreted as mean word period, in- 
cluding pause time. The lower section 
shows similar measures on the 13-word 
sentence. Data for the pre- and post- 
experimental readings are given at the 
left as a matter of interest; their 
resemblance to the 0-sec. experimental 
reading will be observed. The experi- 
mental changes in the sentence have 


been discussed thoroughly in the first 
report and there is no evidence in any 
of the measurements of the complete 
passage which necessitates revision of 
interpretation. Study of the means for 
the undelayed performances suggests 
that the proportion of correct words 
in the passage as a whole appears to 
have been somewhat higher basically 
than in the one sentence studied ear- 
lier, so that the whole set of passage 
means is at a higher level. The dura- 
tion means for the passage are also 
systematically larger than for the sen- 
tence, presumably because it included 
three between-sentence pauses. The 
various values of correct word rate 
are very similar for the two samples. 
The major point, however, is that all 
three measures of both samples varied 
by large amounts in response to 
changes in delay interval, with peak 
disturbance invariably at 0.2 sec. 


Effect of Delay Interval upon Type 
of Error. The 90 experimental per- 
formances yielded a total of 1,548 in- 
stances of articulatory error. The 
results of sorting these by delay inter- 
val and type of error are given in 
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Taste 2. Group means and standard deviations for total number of articulatory errors and errors 











of types shown. 
Time Delay (sec) 
o* Ot 0 dl A 8 
tal 
7 6.00 6.56 6.75 17.56 29.88 26.38 16.19 
o 3.30 3.32 3.91 8.60 14.87 11.35 6.23 
Substitution P 
M 2.56 2.31 3.06 5.38 9.44 8.31 5.50 
o 1.97 1.76 1.85 3.22 5.60 3.37 2.37 
Omission < 
M 2.50 3.69 3.06 6.13 7.69 7.19 5.31 
o 2.18 2.49 2.75 3.10 3.95 3.52 2.23 
iti 
— .63 .25 44 4.31 10.12 9.31 4.31 
o .86 15 19 4.16 6.09 6.93 4.07 
Miscellaneous 
M .3l .ol 19 1.75 2.63 1.56 1.06 
o .46 .62 .39 1.60 2.03 1.01 1.21 








* Pre-experimental 
+ Post-experimental 


Table 2, which shows group means 
and standard deviations. The change 
in the mean number of total errors 
(top row) with variation of time 
delay evidences the afore-mentioned 
increase with peak at 0.2 sec. The four 
types of error also varied in this same 
general way, but by relative amounts 
that were not similar. In other words, 
as the number of total errors varied, 
the shape of the distribution according 
to type did not remain the same. This 
is most readily apparent in Figure 1, 
which was prepared from the columns 
of means in Table 2. It will be seen 
that in the experimental condition 
which involved no delay (labeled 02), 
as well as in the other undelayed con- 
ditions, the errors were preponder- 
antly substitutions and omissions, 
about equally divided. When the ar- 


ticulatory disturbance was at its peak 
(0.2-sec. delay), addition was the 
most common error. Whereas the 
number of omissions, for example, 
approximately doubled from 02 to 0.2, 
Mi 

“~ © hb = ee 


INTERVAL OF DELAY (SEC.) 


sussntuTion 
ous: 


ul 


a a 
eroe 


MEAN NUMBER OF ERRORS 
































03 


“8088 
Q 


Figure 1. Distributions of major types of 
articulatory errors at different time delays. 
0, and 03: pre- and post-experimental read- 
ings with Q-sec. delay. 
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Tas.e 3. Distributions of total substitutions at different time delays. 








Time Delay (sec) 
2 4 





0 8 
Total 49 86 151 133 88 
Sub-Type 
Voicing Error 39 38 48 50 43 
Other 10 48 103 83 45 
Phonemes 
1 49 82 139 117 80 
2-4 0 4 12 16 8 
Stress 
Unstressed 32 43 65 60 53 
Stressed 10 43 86 73 35 








additions became 20 times as common. 
Comparison of the profiles in Figure 1 
indicates the nature of the change in 
shape as the severity of the general 
disturbance changed at the various 
time delays. The data indicate not only 
that severity of articulatory disturb- 
ance varies with interval of delay and 
that delay-induced errors of different 
types are not equally numerous, but 
that the number of occurrences of a 
given type of error depends upon the 
interval of delay at which the observa- 
tion is made. The most distinctive 
characteristic of peak disturbance is 
high incidence of additions.’ 

The errors of each type were sub- 
sorted in various ways. The general 
procedure was to tally the errors with 
respect to two or more categories of a 
given factor and the result was a num- 


*It had been planned to test the inter- 
action of delay interval and type of error 
by analysis of variance, but the test appears 
questionable in view of the heterogeneity of 
variance (see standard deviations in Table 
2). It is believed that the systematic differ- 
ences between the means are sufficiently 
large for confident interpretation without 
formal test, 


ber of distributions among which were 
some that appeared to have descriptive 
utility. These are shown in the tables 
which follow for the five conditions 
of the experiment proper. Each entry 
is the number of instances among all 
16 subjects, the sum of the entries in 
a given distribution corresponding to 
the means of Table 2. 


Substitutions. The results for errors 
of this type are in Table 3. Of the 108 
errors observed in the undelayed ex- 
perimental condition, 49 were substi- 
tutions. As is shown, 39 of these were 
voicing errors, or interchanges of con- 
sonant cognates, which might be re- 
garded as instances of a minimal form 
of substitution. The number of these 
simple errors increased under time 
delay, but at peak disturbance they 
were outnumbered two to one by 
more radical deviations. The latter 


were so extremely varied and unusual 
that attempts to sort them phoneti- 
cally were unproductive and it is 
considered that this unconventionality, 
which may readily be heard in casual 
listening, is itself a distinctive attribute. 
It is possible that most such substitu- 
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Taste 4. Distributions of total omissions at different time delays. 








Time Delay (sec) 





0 Pe 2 4 8 
Total 49 98 123 115 85 
Phonemes 
1 46 86 102 106 73 
2-7 3 12 21 9 12 
Stress 
Unstressed 30 55 67 70 55 
Stressed 15 31 40 35 19 
Mixed. 4 12 16 10 11 








tions are to be interpreted as phonetic 
anachronisms which are directly trig- 
gered by delayed feedback and that 
they have much the same significance 
as the repetitive type of addition 
which is discussed below. 

The second row of distributions in 
Table 3 refers to the ‘length’ of the 
instance of error in number of pho- 
nemes, one of the sortings that were 
carried out with all error types. It will 
be seen that only one-phoneme in- 
stances were observed in the unde- 
layed condition and that such errors 
were also most common in all condi- 
tions. The performances under delay, 
however, yielded a number of poly- 
phonemic substitutions, which aver- 
aged about one per subject at the peak. 

The remainder of the table shows 
the results of sorting the substitutions 
according to the stress of the syllables 
in which they occurred.? Roughly 
one-third of the errors in the unde- 





*It was possible for an error, as defined, 
to be polysyllabic, and all of the major 
types included a few such instances. In some 
of these the stress of the syllables was not 
uniform, and these were termed ‘mixed’ in 
sorting by stress. The category was not 
needed for substitutions. 


layed reading were in stressed sylla- 
bles, a number that probably differs 
little from the proportion of such 
syllables in the average reading. Under 
time delay it will be seen that the dis- 
tribution shifted in relationship to 
degree of disturbance, so that when 
substitutions were most numerous, 
more than one-half of them came in 
syllables judged to have been stressed 
in their respective spoken phrases. The 
impression of stress in time delay is 
that it is characteristically atypical 
and that the speaker has sacrificed his 
usual stress patterns as part of his 
attempt to preserve phonetic accuracy. 
The shift of the stress distribution 
probably indicates that the speaker has 
a tendency to ‘stress’ syllables in 
which errors occur, in connection 
with his effort to avoid them. The 
auditory effect of the association of 
substitution and stress is to emphasize 
the phonetic unlawfulness of substitu- 
tions, already alluded to, by increasing 
their relative prominence. 


Omissions. The sub-sorting of these 
errors was unrewarding, as may be 
seen in the two sets of distributions 
shown in Table 4. In the undelayed 
performances, 46 of the 49 instances 
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TaBLE 5. Distributions of total repetitions at different time delays. 








Time Delay (sec) 
2 a 





0 a 8 
Total 2 53 117 108 34 
First Articulation 
Accurate 0 37 74 72 26 
Inaccurate 2 16 43 36 8 
Articulations 
2 2 52 113 103 33 
3 1 4 5 
Phonemes 
sf 0 32 56 43 22 
2 0 20 47 33 4 
3 1 1 12 21 4 
4-10 if 0 2 11 4 
Stress 
Unstressed. 0 22 43 41 12 
Stressed 7 30 69 53 15 
Mixed i: 1 5 14 7 








were mono-phonemic. This continued 
to be the most frequent length of 
error with time delay, as shown, al- 
though substantial numbers of poly- 
phonemic omissions occurred, consti- 
tuting about one-fifth of all instances 
at 0.2 sec. 


The practice followed in judging 
stress was different from that followed 
with other errors because of the na- 
ture of omission. When a portion of 
a syllable remained, the stress of the 
fragment was judged; when entire 
syllables were omitted, their expected 
stress was estimated. It also will be 
noted that the mixed category men- 
tioned above was necessary. The dis- 
tributions in Table 4 are similar in 
general form throughout the five con- 
ditions and there is no sign of any 
important change as the total number 
of omissions varied. 


Repetitions. The characteristically 
high incidence of additions to the nor- 


mally expected output was remarked 
above. These were divided into repeti- 
tions and insertions (non-repetitive 
additions) and the results of sub-sort- 
ing the two classes are shown in 
Tables 5 and 6, respectively. The rela- 
tive numbers of each may be com- 
pared in the top lines of the two tables. 
Among the total of 449 additions 
counted in the 64 performances under 
delay, 312, or approximately 70%, 
were classified as repetitions. This 
classification was assigned conserva- 
tively, and only when the phonetic 
resemblance between a given addition 
and the preceding utterance was un- 
questionable. Delay-induced repeti- 
tions are interpreted herein according 
to a conception of speech action in 
which the cueing stimuli for the serial 
motor responses in speaking are taken 
as supplied sequentially by the feed- 
backs of the responses themselves (2). 
A given response cues the next re- 
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TaB.E 6. Distributions of total insertions at different time delays. 








Time Delay (sec) 
2 4 





0 

Total 5 16 45 41 35 
Phonemes 

1 4 13 33 29 21 

2-6 1 3 12 12 14 
Stress 

Unstressed 11 33 26 25 

Stressed I 9 ll 6 

Mixed 0 0 3 4 4 
Location 

Between Words 3 9 30 26 29 

Within Words 2 7 15 15 6 








sponse, etc. In this view of speech 
control, when the feedback of one 
response is delayed so that it coincides 
with a second response for which it 
is the stimulus, it will trigger a repeti- 
tion of the second response, if it 
dominates the feedback complex dur- 
ing the second response. 

The habitual incidence of repeti- 
tions in the speech of a small number 
of individuals, who have been studied 
extensively for that reason in part, 
and the comparative rarity of this 
type of error in the oral reading of 
unselected subjects provide special in- 
terest in this type of error. While 
reading the 55-word passage without 
time delay, 14 of the subjects yielded 
no repetitions. The other two re- 
peated once each, both times ob- 
viously to correct errors of reading. 
Under 0.2-sec. delay, the same subjects 
averaged 7.3 repetitions, produced at 
a rate of one every four seconds. No 
subject was free of repetitions, and 
one man produced 18. 

The two corrective repetitions ob- 
served in the undelayed condition 
were ‘with its past—path high above’ 
and ‘people look for—people look.’ 


Undoubtedly some of the delay-in- 
duced repetitions have similar func- 
tions, but many of them do not sound 
purposeful in the usual sense; consider, 
for example, ‘white light—light’ and 
‘these take the—the shape.’ The first 
row of distributions in Table 5 gives 
the results of an attempt to make a 
division that would bear on this point. 
The basis of sorting was the accuracy 
or inaccuracy of the first articulation 
of the repeated portion. It was rea- 
soned that corrective repetitions would 
tend to come within the class with in- 
accurate first articulation (although 
not all of that class are of that type) 
and that the class with accurate first 
articulation would be composed large- 
ly of non-corrective repetitions (al- 
though it would not include all of 
them). The distributions show that 
most of the instances were in the lat- 
ter class. This result is viewed as 
support for the auditory impression 
that delay-induced repetitions ‘sound 
as if? a large proportion of them ar 
direct and automatic responses to mis- 
information in the feedback complex, 
rather than corrections of other earlier 
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Taste 7. Distributions of total miscellaneous errors at different time delays. 








Time Delay (sec) 
re 4 4 





0 8 

Total 3 28 42 25 17 
Shifted Juncture 2 16 20 18 10 
Slighting 1 12 22 Ff 7 








errors. This interpretation is sup- 
ported inferentially by the very nu- 
merous errors of other types that were 
not immediately followed by repeti- 
tions. 

The second section of Table 5 gives 
the results of counting the number of 
times the repeated portion was articu- 
lated in each instance. This is perti- 
nent to the nature of the delay-in- 
duced repetition. Let a train of speech 
responses A, B, B’, C and their respec- 
tive feedbacks a, b, b’, c be assumed, 
with B’ denoting an unintended repeti- 
tion of B. Let it also be assumed that 
significant portions of the feedbacks 
are delayed by a time interval such 
that the delayed portion of a coincides 
with B and that the train of responses 
is adequately periodic so that the ef- 
fective delay is also one response 
thereafter. B’ is interpreted as a re- 
sponse to the delayed portion of a, 
with a dominating the undelayed por- 
tion of b during B. The delayed 
portion of b then coincides with B’ 
and with the undelayed portion of J’. 
Since b resembles b’, the feedback 
complex during B’ triggers C. The 
system does not repeat B again because 
the first repetition temporarily restores 
the normal phase relationships of out- 
put and feedback, even though two 
different versions of the same action 
are involved. It will be understood 
that speech action is sufficiently aperi- 


odic and output amplitudes sufficient- 
ly variable that conditions such as 
were assumed to produce B’ do not 
prevail for long periods of time. If 
this were not so, if the system con- 
tinued to make the effort to produce 
its usual output, and if the delayed 
feedback invariably dominated, then 
it would be seen that a delay of feed- 
backs by one response would yield 
A, B, B,C, C’, D, D’, ete. 

Table 5 shows that almost all repeti- 
tions were simple, two-articulation er- 
rors, (B, B’), and that the remaining 
few involved three articulations (B, 
B’, B”). Longer repetitions have been 
observed at other times, however; 
they are uncommon, appear to be 
person-linked, and give the impression 
of wild and uncontrolled oscillation 
of the vocal mechanism. 

It was noted above that most sub- 
stitutions and omissions were mono- 
phonemic, although longer errors be- 
came more numerous as disturbance 
increased. The situation is different 
for repetitions, as the next distribu- 
tions in Table 5 show. First, poly- 
phonemic errors outnumbered mono- 
phonemic errors at both 0.2 and 0.4 
sec. delay. Second, the 0.4-sec. delay 
interval elicited about three-fifths of 
all long (3-10 phonemes) repetitions. 
At that delay, 30% of the errors were 
long; at 0.2 sec., 12%. This is the only 
classification of error anywhere in the 
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data which yielded strong indication 
of peak incidence elsewhere than at 
0.2 sec. The finding is consistent with 
the suggestion made elsewhere that 
‘units of speech control’ should not 
be ‘identified with any of the conven- 
tional units such as the phoneme (2).’ 

The final section of Table 5 presents 
the stress classifications of repetitions. 
In each instance the category was de- 
termined by the stress of the repeated 
portion at the time of its first articula- 
tion. Only rarely did the stress of the 
second or third articulations differ. It 
will be seen that the first articula- 
tion was stressed more often than un- 
stressed, an association similar to that 
reported for substitutions. It seems 
unlikely that this indicates that stressed 
syllables are vulnerable to repetition 
because they are stressed. As a matter 
of fact, if there is any difference, an 
unstressed syllable should be more 
vulnerable because the level of its un- 
delayed auditory feedback is ordi- 
narily low (by definition) and more 
subject to masking by the delayed 
signal. As has been said above, it is 
believed that heightened ‘stress’ is a 
part of the effort to evade articula- 
tory error. The association of repeti- 
tion and obtained stress is interpreted 
as reflecting a tendency for both to 
occur at times when control of speech 
action is most precarious. 


Insertions. Table 6 is devoted to the 
non-repetitive additions. The instances 
ranged from complete words added to 
the text to unexplainable, seemingly 
random articulations within words. 
The general tabulation indicates that 
occurrence varied with time delay in 
the familiar manner, with the average 
subject producing about three in- 
stances at the peak. As may be seen, 
most insertions were mono-phonemic, 


but longer instances became more fre- 
quent as the delay interval was in- 
creased. Unstressed insertions predom- 
inated strongly, and with this type of 
error, the stress of the error itself, or 
of the syllable in which it occurred, 
was noted. Most insertions occurred 
between words in all experimental 
conditions. 


Miscellaneous Errors. Two kinds of 
errors, neither of high frequency, 
were combined in this category. The 
first has been termed shifted juncture; 
an example is [raun'dart{] for round 
arch. The other was the sort of semi- 
omission known as slighting. The total 
number of these kinds of errors in all 
conditions was 115, about as many as 
the number of repetitions under 0.2- 
sec. time delay. For the sake of com- 
pleteness, they are shown in Table 7, 
but no sub-sorting was attempted. 


Summary 


Sixteen young men read a prose pas- 
sage five times each. The time delay 
of amplified auditory feedback dif- 
fered at each reading, the values being 
0, 0.1, 0.2, 0.4 and 0.8 sec. and the 
performances were recorded. The 
articulatory disturbances were an- 
alyzed and described. 


1. In agreement with previous re- 
port, the general effect of time delay 
was to reduce the number of correct 
words, increase the total reading time 
and retard the correct word rate. Dis- 
turbance was maximal when the delay 
was 0.2 sec. 


2. Severity of articulatory disturb- 
ance, estimated by number of in- 
stances of error, varied substantially 
with both delay interval and type of 
error; interaction of interval and type 
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was also large. In other words, de- 
layed auditory feedback not only in- 
duces articulatory disturbances, but 
selectively varies the number of dis- 
turbances of certain types in relation 
to the specific interval of delay. 

3. The substitutions induced by de- 
lay tended to involve improbable 
phonetic elements, to be mono-pho- 
nemic, and to occur in stressed syll- 
ables. The latter relationship, which 
apparently is based on a tendency to 
increase vocal effort on syllables in 
which errors occur, in an attempt to 
avoid them, increases the listener’s im- 
pression that most substitutions are 
phonetically unlawful. 

4. Delay-induced omissions were 
high in frequency of occurrence and 
fairly substantial numbers of them 
were poly-phonemic at the point of 
peak disturbance, but otherwise they 
were unremarkable. 

5. High incidence of additions was 
the most distinctive characteristic of 
the peak disturbance and about 70% 
of the additions were repetitive. The 
repetitions were not predominantly of 
the corrective type, such as those 
heard occasionally in free speech, but 
for the most part appeared to be un- 
purposeful responses to stimuli in the 
delayed feedback. Almost all repeti- 
tions were simple double articulations 


and it was pointed out that this would 
be expected if the second articula- 
tion temporarily restores the normal 
output-feedback relationship. At 0.2 
and 0.4 sec., poly-phonemic errors 
were in the majority and the length 
of the portion repeated varied di- 
rectly with the delay interval, short 
errors peaked at 0.2 sec., long errors 
at 0.4 sec. The first articulation of the 
repeated portion of the utterance was 
more frequently stressed than un- 
stressed, seemingly indicating that 
both error and increased effort tended 
to occur at times of precarious con- 
trol. 


6. Most of the insertions, or non- 
repetitive additions, were mono-pho- 
nemic, unstressed and occurred be- 
tween words. 


7. Two other forms of error, 
shifted juncture and slighting, varied 
with time delay in the manner of the 
other types, but were considerably 
less common. 
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Artificial Speech In Phonetics 


And Communications 


Hugh K. Dunn 
Harold L. Barney 


Men have found a number of ways of 
producing, in an entirely artificial 
manner, sounds which are recognized 
by a listener as imitating speech. In 
some, only the separate sounds of 
speech are attempted. In others, whole 
sentences are produced, either by an 
operator using manual controls, or 
mechanically by a program device 
having prepared instructions. 

One purpose of producing speech 
artificially is the amusement or amaze- 
ment of the listeners. The synthetic 
method, however, is always one way 
to learn more about a real process. 
This has been, and still is, the princi- 
ple use of artificial speech. It is an 
attractive method in phonetics re- 
search, because of the possibility of 
exact control of factors such as fre- 
quency and timing, or the position 
and motion of simulated articulators. 
One interesting goal might be an in- 
strument, for which exact instructions 
could be given to cause it to make any 
desired speech sound, in any language. 
This could be a standard, which 
would not vary with place or time. 
A set of instructions would offer some 
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advantages, in distribution and preser- 
vation, over phonograph records. 
Artificial speech now has another 
use which promises to become of 
economic importance in the com- 
munications field. This use is the 
rebuilding of speech which has been 
reduced to its essential parameters, for 
transmission in narrow frequency 


bands. 


History. The documented history 
of artificial speech begins in the latter 
half of the eighteenth century. Al- 
though there is some mention of 
much earlier attempts, we have no 
assurance that the devices that seemed 
to work well were not simply fakes, 
in which real speech was covertly in- 
troduced through a tube. In the pe- 
riod 1779 to 1791, there are two 
prominent names in the field, and 
there were probably others. Kratzen- 
stein (37) made resonator shapes 
which, when activated by reeds, pro- 
duced five major vowel sounds. The 
talking device of von Kempelen (34) 
was more elaborate. For vowels, an 
open resonating chamber was partly 
closed and otherwise modified in use 
by one of the hands, and finger- 
operated levers caused several con- 
sonants to be produced. Some simple 
sentences were possible. 

More detailed accounts of the early 
history will be found elsewhere, as for 
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example in the book by Sir Richard 
Paget (47). The von Kempelen story 
has been reviewed in recent times b 
Dudley and Tarnoczy (16). We note 
here only that similar direct mechani- 
cal and acoustical imitations of the 
human voice continued to be studied 
by various investigators, through the 
nineteenth century and into the 
twentieth. Paget himself, in the 1920s, 
did some outstanding experimentation 
in shaping model resonators of plas- 
ticine, for producing the sounds of 
different vowels. 

An entirely different method was 
introduced by Helmholtz (29). In- 
stead of using acoustical resonators for 
producing the vowel formants, he 
built up these formants through the 
use of separate tuning forks, of the 
right frequencies and amplitudes to 
replace the harmonics of the human 
vocal cords, as modified by the 
resonant cavities. Koenig (35) got 
this effect by using a siren made up 
of multiple toothed wheels, with air 
streams acting on those wheels having 
the correct number of teeth to give 
the desired partial tones. Miller (42) 
and Stumpf (67) used sets of small 
organ pipes for the purpose. 

The use of electrical analogues was 
begun by Stewart (60) in 1922. He 
used two resonating electrical circuits, 
which represented the acoustical reso- 
nators of the vocal tract. When these 
were actuated by an interrupted cur- 
rent (at a rate similar to that of the 
vocal cords), and the two resonator 
outputs were combined and made 
audible in a telephone receiver, differ- 
ent vowels were heard as the tuning 
of the resonators was changed. 


Use in Phonetics Research 


The earlier work in artificial speech 
has been sketched above only briefly. 
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Ficure 1. The controls of the Voder of 
Dudley, Riesz and Watkins (15), and com- 
parison with the real vocal apparatus. The 
numbered tabs are finger controls for the 
outputs of ten band-pass filters. 


We will be more specific about 
modern efforts which have had the 
advantage of electronic techniques, 
and some of which have profited 
through the more extensive analyses 
of real speech made available by the 
sound spectrograph (48, 36). 


The Voder. Probably the first elec- 
trical speech maker which attempted 
to put sentences together was the 
Voder (Voice Operation DEmonstra- 
toR) of Dudley (12, 15). It was 
formed by separating the receiving 
end of the Vocoder (11) and giving 
it manual and pedal controls. The 
Voder process, shown diagrammatic- 
ally in Figure 1, starts with the elec- 
trical generation of two types of com- 
plex vibration. One is a ‘buzz,’ repre- 
senting the larynx tone in voiced 
sounds. It consists of a fundamental 








sf 


Figure 2. An operator at the controls of 
the Voder, with which she can make ar- 
tificial connected speech. 


of variable pitch, plus a large number 
of harmonically related overtones. 
The other source is a ‘hiss,’ consisting 
of random noise, and is used in un- 
voiced sounds. 

One or the other of these sources, 
or both, is applied to the inputs of a 
group of band-pass filters, which to- 
gether cover the speech range of fre- 
quencies. The output of each filter 
has a separate control of amplitude, 
before these outputs are combined to 
form the synthetic speech sound. 
Since, in general, several harmonics of 
the buzz tone will pass each filter, the 
synthesis of the desired speech wave 
is performed in a sort of ‘lumped’ 
fashion. The narrower the filters, the 
closer this method will approach that 
of the summation of pure tones, in 
the imitation of voiced sounds; but 
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the correspondingly increased number 
of filters make the control task of the 
operator more complicated. The addi- 
tion of the random source makes 
possible the synthesis of unvoiced 
sounds, including whispered vowels. 
Stops are added by the use of timing 
controls. 

The Voder was demonstrated to 
great numbers of people during the 
New York and San Francisco World’s 
Fairs, in 1939 and 1940. The operators, 
one of whom is shown at the Voder 
controls in Figure 2, went through a 
lengthy period of training before 
these demonstrations. They not only 
produced sentences previously prac- 
ticed, but to some extent answered 
questions via the Voder. Only ten 
filters were used, and the speech 
quality was not as good as is possible 
with more and narrower filters. 

For phonetics, the Voder can give 
information on the frequency struc- 
ture of speech sounds. From the con- 
nected speech feature, moreover, a 
great deal can be learned about how 
sounds are put together. The Vo- 
coder, also, can be a source of pho- 
netics information. Although it starts 
with real speech, this is reduced to a 
set of simple parameters, and the 
speech may be remade with frequen- 
cies and amplitudes changed from 
those of the original, although the 
timing of the real speech is retained. 
A recent study of speech, using the 
Voder principle but without the con- 
nected speech feature (except in tape 
recordings pieced together), has been 
made by Oizumi and Kubo (45). 


Simple Electrical Resonators. In this 
method, the ‘buzz’ and ‘hiss’ sources 
remain the same, but the multiple 
band-pass filters of the Voder are 
replaced by a smaller number of elec- 
trical resonating circuits. This is ob- 
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viously an extension of the method 
pioneered by Stewart (60), and it has 
been carried on in many laboratories. 

An example may be given in the 
circuit of Wagner (62). He used four 
electrical resonators which were con- 
nected in parallel, in the sense that his 
buzz-like source was applied directly 
to all the resonator inputs, and that 
the outputs were recombined to pro- 
duce the desired vowel sound. An ob- 
jection may be made to this connec- 
tion, that it does not imitate closely 
enough the real vocal resonators, 
which for non-nasal vowels are essen- 
tially in series (17). 

.The thought behind the use of 
resonators in parallel is that a wave 
like the known wave of a vowel sound 
could be built up by the adjustment 
of peak frequencies and amplitudes of 
the resonances. To produce a good 
vowel sound, however, attention must 
be paid also to the amplitudes of har- 
monics lying between the peaks. 
These cannot be made right by ad- 
justment of resonator damping alone. 
Weibel (63) has shown mathemati- 
cally that parallel resonators can be 
made to produce the same output 
spectrum as series resonators, pro- 
vided the outputs of alternate resona- 
tors (in ee of increasing fre- 
quency) are reversed in phase, and all 
amplitudes and damping constants 
correctly adjusted, before the differ- 
ent outputs are combined. It is inter- 
esting to note that Wagner (62) 
placed reversing switches in all his 
resonator outputs, and used them 
empirically to get the best effects, but 
he leaves no record of the phasing 
found best. 


Electrical resonators in series (that 
is, with the output of the first going 
to the input of the second, and so on) 
have also been used. A speech syn- 
thesizer at the Massachusetts Institute 


of Technology (55, 56) uses this con- 
nection. The advantage of the series 
arrangement is that independent ad- 
justment of iesonator amplitudes is 
unnecessary, assuming that a suitable, 
fixed damping has been given to each 
resonator. Relative peak amplitudes 
will be found to depend upon the 
adjustment of frequencies. The am- 
plitudes will also be found to match 
those of the real voice, except that 
the highest resonances used will tend 
to be low in amplitude. Thus, the 
second and third will be low, unless 
a fourth and perhaps a fifth resonance 
is used (18). In the M.I.T. synthesizer 
three resonators of variable frequency 
are used, and a fourth is fixed at 
3500 cps to hold up the amplitudes in 
the others. This synthesizer has been 
provided with electrical control cir- 
cuits which are actuated by prepared 
punched tapes, giving 10 different 
vowels. With 10 consonants also pro- 
vided, partly through auxiliary cir- 
cuits, connected speech can be pro- 
duced. 

A more complete discussion of the 
different properties of parallel and 
series resonators, and their relations 
to the real vocal system, has been 
given by Flanagan (23). It is sug- 
gested that the parallel connection is 
more suitable for the production of 
some of the consonant sounds. A com- 
bination circuit has been proposed 
(24). 

Correlation of the control settings 
of the resonators with the actual 
sounds heard provides much speech 
information. As an example of the use 
of electrical resonators in a different 
type of phonetics research, Flanagan 
has determined difference limens for 
frequency (2/7) and for amplitude 
(22), when these artificial formants 
are presented to the ear. 
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Figure 3. The tone-wheel form of pat- 
tern playback used by Cooper, Liberman 
and Borst (6), from which either natural 
or artificial sound spectrograms can be re- 
produced. 
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The Pattern Playback. It is proba- 
ble that readers of this article are 
acquainted with the type of pattern 
called ‘visible speech,’ produced by 
the sound spectrograph (48, 36). It 
is a two-dimensional portrayal of 
analyzed sound, with time in one di- 
rection and frequency in the other, 
but with a third quantity (intensity) 
shown by the darkness of the pattern 
at any point. The pattern playback is 
a means for reconverting such a dis- 
play into the original sounds. When 
patterns drawn by hand are substi- 
tuted for those obtained from the 
analysis of real speech, the pattern 
playback becomes an artificial speech 
device. 


In one form of the playback de- 
scribed by Schott (51), a line of light 
extending through the frequency di- 
mension shines through the pattern of 
varying density. Photoelectric cells 
behind the pattern then become the 
controls of a set of Voder filters, the 
frequency position of each cell, on 
the pattern, corresponding to that of 
the filter controlled. A separate track 
on the pattern can be used for switch- 
ing between buzz and hiss and for 
controlling the fundamental fre- 
quency of the buzz. As the pattern 
moves in the time dimension, con- 
nected speech is produced. 

Another form of the pattern play- 
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back is attributed to Cooper (4, 6). A 
diagram is shown in Figure 3. The 
buzz fundamental and harmonics are 
generated optically by shining the 
light through the rotating tone wheel, 
which consists of variable density film 
having 50 different sets of sine waves 
recorded concentrically on it. The 
50 harmonics in the modulated light 
beam fall upon the pattern (spectro- 
gram) at the appropriate frequency 
places, and are either transmitted 
through or reflected from the pattern 
into a single photoelectric cell. The 
pattern itself takes the place of filters. 
If the synthetic pattern is made with 
wide strokes, several harmonics are 
reproduced for each stroke, and the 
effect is the same as that of a wide 
band in the Voder. With fine-grained 
patterns, synthesis with single har- 
monics can be achieved. In either 
case the pitch is limited to a mono- 
tone, corresponding to the speed of 
rotation of the tone wheel and the 
number of sine periods in the smallest 
circle of the wheel. A random noise 
source is not provided. Unvoiced 
sounds can be approximated by pass- 
ing a number of harmonics in upper 
frequency regions, with low har- 
monics excluded, and by breaking up 
the pattern into small parts in the 
time dimension, in an irregular man- 
ner. Intelligible connected sentences 
can be produced. 

The tone-wheel type of pattern 
playback has been used in extensive 
phonetics research at Haskins Labora- 
tories, New York (5, 8, 9, 10, 41). 
The types of information they obtain 
by these purely synthetic methods 
include the number and frequency 
positions of formants required for 
vowel recognition, and the role of 
both bursts of high-frequency energy 
and of vowel transitions as acoustic 
cues for the recognition of conso- 
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nants. A general summary of the field 
has been written by Borst (J), and an 
interpretation of results by Liberman 
(40). 


Heterodyned Formants. The speech 
maker of Lawrence (39) called PAT 
(Parametric Artificial Talking-device) 
achieves much the same result as if 
resonating circuits had been used, but 
actually the formants are produced 
by a different process. The source is 
a buzz, having the desired fundamen- 
tal frequency and harmonics. How- 
ever, the rate of amplitude decrease 
of the harmonics with increasing fre- 
quency is not like that of the real 
vocal cord tone, but more like that of 
a formant centered at the fundamental 
frequency. Thus the formant shape is 
produced at the beginning, along with 
the fundamental. Then by the use of 
four oscillators and a modulation- 
demodulation process, the formant is 
shifted simultaneously to three desired 
frequency positions. Each formant 
then contains partial tones spaced at 
the fundamental interval, and they are 
made exact harmonics of the funda- 
mental by cutting off all four oscil- 
lators momentarily at the beginning 
of each period of the buzz. For un- 
voiced sounds a random noise, or ‘hiss,’ 
is substituted for the periodic modula- 
tion tone. 

Lawrence’s machine uses six con- 
trols, the time variations of which 
produce connected speech. These 
controls are the frequencies of the 
fundamental and the three formants, 
and the amplitudes of buzz and hiss. 
The desired variations of these quanti- 
ties are painted as contours of opaque 
sections on a glass slide, which is then 
scanned by a ‘flying spot’ from a 
cathode ray oscilloscope. The currents 
generated in a photoelectric tube then 
actuate the controls. A sentence some 


two or three seconds long (at normal 
speech rates) can be produced from 
one slide. The result is not only in- 
telligible, but reproduces also some 
of the individual characteristics of the 
voice from the analysis of which the 
control slide was patterned. 

In September, 1952, PAT was dem- 
onstrated at the Institution of Elec- 
trical Engineers, in London, and has 
since appeared in a number of places. 
Points of phonetic interest have been 
included in some of these demonstra- 
tions. It is shown, for example, that a 
word in a sentence can be emphasized 
in three different ways: By a raised 
fundamental frequency, by increased 
amplitude, and by longer duration. 
Moderate changes in relative formant 
amplitudes or in formant damping 
affect speech quality very little. In a 
recent study (2), it was shown that 
the exact vowel perceived in an iso- 
lated syllable depends upon the fre- 
quency ranges of the formants in the 
vowels of a preceding sentence, and 
that the extent of the dependence is 
a function of the time interval be- 
tween sentence and syllable. A more 
extended discussion of these experi- 
ments is given in reference (38). 


Vowels by the 100-Tone Generator. 
The 100-tone generator was described 
in an abstract by Wente, Lovell and 
Muller (64) and was demonstrated by 
Fletcher (25) to the American Physi- 
cal Society in 1946. It can be used to 
synthesize any continuing harmonic- 
ally constructed sound, and vowels 
fall in this class. The 100 harmonically 
related sine wave tones are recorded 
magnetically in separate tracks on a 
single rotating drum, and each track 
is provided with an electro-magnetic 
pickup. The output amplitude of each 
tone can be separately adjusted on 
the control board, which is pictured 
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Figure 4. The control panel of the 100- 
tone generator. The configuration of white 
control knobs indicates that an artificial 
vowel of two formants is being created. 


in Figure 4. The configuration of the 
control knobs shows the amplitude vs. 
frequency curve of the output, am- 
plitude being on a logarithmic scale. 
The setting shown in Figure 4 indi- 
cates two formants, which will be 
heard as a vowel. 


This method of building up vowels 
will be recognized as that of Helm- 
holtz (29) and of D. C. Miller (42), 
although with much more convenient 
controls than were available at the 
time of the earlier investigators. R. L. 
Miller (43) has used the instrument in 
a study of synthetic vowels, in which 
he made wide variations in the fre- 
quencies of the fundamental and of 
the first and second formants. The 
sound resulting from each combina- 
tion of settings was recorded on a 
magnetic tape, with naturally spoken 
consonants recorded on either side to 
form a syllable. The syllables were 
then reproduced one after another to 
a group of listeners, who wrote down 
the vowel they thought they heard in 
each syllable. Tests were also made 
of formant amplitude variation, and 
of the addition of a third formant. 

When the frequencies of formants 
one and two were plotted against 
each other, the different vowels ob- 
served were found to occupy certain 
areas of the plot, with boundaries 
somewhat uncertain and shifting to- 
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ward higher formants with higher 
fundamental frequencies. Some of the 
vowels [u, 9, a] were apparently heard 
as single resonances, although the nat- 
ural sounds always contain two reso- 
nances moderately close together. In 
other cases, chiefly front vowels, a 
third formant added significantly to 
the percentage of recognition by the 
observers. It was also found that a 
listener’s evaluation of a given sound 
was influenced somewhat by the par- 
ticular sounds preceding it. 


The Electrical Vocal Tract. A new 
type of vowel maker called the Elec- 
trical Vocal Tract (EVT) was intro- 
duced by Dunn (17) and Schott (52). 
It has a rather fundamental difference 
from the other electrical devices 
which have been described. These 
other speech makers are designed to 
produce an air wave like that known, 
from analyses, to be produced by real 
speech. The controls usually adjust 
directly the frequencies and ampli- 
tudes of formants. The EVT, on the 
other hand, is designed to be an ana- 
logue of the vocal tract itself. The 
controls simulate the motions of the 
real articulators. The formant fre- 
quencies and amplitudes are not di- 
rectly set, but follow naturally from 
the articulator settings. 
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Ficure 5. A model of the real speech tract, 
in which cavities and constrictions are rep- 
resented as cylinders. 


The design of the EVT is based on 
the simplified model of the vocal 
apparatus shown in Figure 5. The 
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cavities and constrictions are repre- 
sented as cylindrical tubes, in which 
the acoustic constants of mass and 
compliance are distributed along the 
lengths of the cylinders. This concept 
leads to resonant frequencies, not just 
one per cavity, but many. The old 
notion of a separate cavity for each 
speech formant is thus abandoned. 


TONGUE 








REACTANCE 
Ficure 6. The electrical analogue of the 
model of Figure 5, in which cylinders are 
replaced by transmission lines. This is the 
circuit used in the Electrical Vocal Tract 
of Dunn (17). 


The electrical analogue of a cylin- 
drical tube is a transmission line, in 
which inductance and capacitance are 
distributed along the line in the same 
way as the distribution of acoustical 
constants along the tube. Furthermore, 
when sections of transmission lines of 
different constants are coupled to- 
gether, the interactions follow the 
same laws as those of cylinders of 
different sizes coupled as in Figure 5. 
Thus an analogue of the system of 
Figure 5 is the electrical circuit of 
Figure 6. The volume velocities i, and 
i in Figure 5 become electrical cur- 
rents in Figure 6. The cavities are 
represented by transmission lines, but 
it was found that the constrictions 
formed by tongue and lips could be 
safely ‘lumped’ as acoustical masses, 
represented in the analogue by in- 
ductances. 

For practical reasons, it is necessary 
to replace a continuous piece of trans- 
mission line by a series of small sec- 
tions of inductance and capacitance. 





Figure 7. The controls of the Electrical 
Vocal Tract of Schott (52) are seen in this 
photograph. The central panel can be 
moved bodily to right or left, thus changing 
the position of the tongue constriction in 
the tract. 


In the EVT of Dunn and Schott these 
sections each represent a cylinder 
0.5 cm long and 6 sq. cm in cross- 
sectional area. Twenty-four of the 
sections are used (most of them not 
shown in Figure 6), but several at 
each end may be removed from the 
circuit as desired. The constriction at 
the lips is represented by the variable 
inductance at the right of Figure 6. 
This inductance is mounted behind 
the fixed panel at the right in the 
photograph, Figure 7. The variable 
inductance representing the constric- 
tion formed by the tongue is on the 
sliding panel in the center of Figure 7 
and can be inserted between any two 
sections of the line, as indicated in 
Figure 6. This inductance therefore 
divides the line into simulated cylin- 
drical cavities. Either of the constric- 
tions can be reduced to zero when 
desired. A deficiency of the arrange- 
ment is that (for simplicity of con- 
trols) it is not possible to alter the 
simulated cross-sectional areas of the 
cavities. For some vowels this does 
not matter. For others, it is found 
possible to correct the deficiency, as 
far as acoustical quality is concerned, 








by adding a small amount of a second 
intermediate constriction. 

The EVT is actuated by applica- 
tion of a buzz or hiss source (as in 
the Voder) to its input. The output 
is taken across a small inductance 
which gives the frequency character- 
istic of radiation from the lips, and 
this output is amplified and made 
audible in a loudspeaker. 

Because of the close analogy to the 
methods of production of real speech, 
the number of resonances and the 
amplitude relations of harmonics both 
in and between the formants are all 
similar to those of real speech. The 
vowels produced sound very natural, 
especially when the pitch of the 
source is given an inflection, and the 
buzz is faded in and out at a natural 
rate. These changes can be effected 
by the electrical design of auxiliary 
circuits. 

A number of questions of phonetic 
interest can be answered by the EVT. 
The very fact that the whole series of 
non-nasal vowels is produced in good 
quality shows that the acoustical 
counterpart (Figure 5) is an adequate 
model for the explanation of vowel 
discrimination. It is not necessary to 
look for any process further than 
changes in dimensions of cavities and 
constrictions in the tract lying above 
the glottis (with nose cavity added in 
the usual method of production of 
nasal sounds). This eliminates such 
things as resonances in the trachea, 
bronchial tubes, and lungs, changes in 
the harmonic content of the vocal 
cord tone, and changes in the hard- 
ness of the walls of the tract. These 
things may exist but are not necessary 
for vowel discrimination. 

The notion of a separate cavity for 
each formant is shown to be unrealis- 
tic. Also, the assignment of back 
cavity for first resonance, etc., is not 
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wholly correct. Interaction makes all 
formants dependent on the entire 
tract. However, where a chief de- 
pendence can be assigned, the first 
formant may be due more to the 
throat cavity in some vowels ({i] for 
example) and more to the mouth 
cavity in others (as in [u]). 

More than one setting of the EVT 
may produce the same vowel. This is 
to be expected, since in both real and 
artificial tracts there are more inde- 
pendent controls than necessary to 
place the two or three formants most 
essential to vowel quality. Other 
resonances in the audible range will 
be changed, however, in the different 
setting, and this may impart a some- 
what different character to the sound, 
even though the vowel is the same. 

Positions and motions of the tongue 
can be confirmed. For example, in the 
diphthong jar], at the start there is 
a constriction formed by the base of 
the tongue, while at the end there is 
a constriction in the mouth. At first 
thought, it might be assumed that the 
constriction moves gradually forward 
during the diphthong. By exploration 
of intermediate sounds and by com- 
parison with real speech, the EVT 
has shown that the actual motion is 
a decrease in the throat constriction, 
which is simultaneous with a growth 
of the mouth constriction. 

When a vowel is formed on the 
EVT at a man’s pitch, and the funda- 
mental frequency is then raised to a 
woman’s, the vowel quality is poor 
and the voice is not clearly recog- 
nized as a woman’s. When, in addi- 
tion, the tract length is decreased 
proportionately in both cavities, the 
vowel becomes clear again and the 
voice distinctly a woman’s. The form- 
ant frequencies are raised by the 
shortening, which agrees with known 
facts of real speech. The effect is 
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even more striking at the still higher 


pitch and shorter tract of a small 
child. 


The Variable Area Tract Analogue. 
A modification of the transmission- 
line circuit has been made by Stevens, 
Kasowski and Fant (59). They have 
put controls on the inductances and 
capacitances of each short line section, 
which has the effect of making the 
simulated cross sectional area of that 
part of the tract variable. Although 
the simplicity of controls of Figure 6 
is lost, it is now possible not only to 
make one cavity smaller in cross sec- 
tion than another, but also to taper 
each cavity and thus more nearly 
approach real configurations. Separate 
inductances for censtrictions are not 
necessary, for these can be formed by 
narowing one or more line sections in 
the desired positions. 


Using this analogue, and adopting a 
standard rate of taper between cavity 
and constriction, Stevens (55, 57) and 
House (30) have been able to specify 
a number of vowels in terms of three 
parameters. These are the radius of 
the tube at the point of greatest con- 
striction, the distance of this point 
from the glottis and the size (ex- 
pressed as ratio of area to length) of 
the mouth opening. They have also 
investigated the formant transitions 
that take place when the tract con- 
figuration is changed to produce a 
stop consonant (58). By the addition 
of an analogue of the nasal cavity, as 
a branch circuit from the main vocal 
tract, they have made studies of nasal 
vowels (31). 


A dynamic analogue of the trans- 
mission-line type has been described 
by Rosen, Stevens, and Heinz (50). In 
this model all changes are made by 
means of electrical voltages, and a 
timing arrangement operates the con- 


trols in the right order to produce 
connected speech. Consonants can be 
produced by introduction of a noise 
source at the proper points along the 
simulated vocal tract. 


The Artificial Larynx. Although the 
artificial larynx is listed here among 
devices for artificial speech, in its use 
only the vocal cord tone is artificial. 
The modulation of this tone to pro- 
duce speech is accomplished by the 
real speech apparatus. Its main pur- 
pose is to provide a means of speech 
for those who have lost the use of 
their vocal cords. In the type de- 
scribed by Riesz (49), a reed produces 
the complex sound. It is actuated by 
the user’s breath (when the trachea 
terminates in an opening in the 
throat) or by a bellows. A short tube 
from the reed is placed in the mouth. 
Because of the introduction of the 
source at a point different from that 
of the vocal cords, the output spec- 
trum is somewhat modified, but good 
intelligibility is realized. A comparison 
of speech by the artificial larynx with 
esophageal speech has been made by 
Hyman (33). 

The reed was replaced by a tele- 
phone receiver, with a tube to the 
mouth, in experiments by Firestone 
(19). A small receiver pressed against 
the outer wall of the pharynx is used 
by Wright (66). In both of these 
cases, various kinds of original sources 
of sound can be used, e.g., a bell or a 
puffing locomotive obtained from re- 
cordings. If the source is sufficiently 
rich in harmonics, articulated speech 
can be superimposed upon the original 
character of the source, and novel 
effects are produced. 


Use in Communications 


Some of the means for creating 
artificial speech which have been de- 











scribed in the preceding paragraphs 
are of interest in the field of com- 
munications since they offer possi- 
bilities of transmitting speech over 
channels that have much narrower 
frequency bandwidths than are nor- 
mally required for telephone circuits. 
This possibility exists because the con- 
trols required to synthesize artificial 
speech are few in number and require 
only a low rate of transmission of in- 
formation (13). In the present state of 
the art, the principal problem in mak- 
ing a transmission system using arti- 
ficial speech is the acquisition, from 
the talker’s speech signals, of the 
appropriate control currents required 
to operate the synthesizer. Until re- 
cent years, much more attention has 
been given to the synthesis of artificial 
speech by the various means men- 
tioned in the preceding section, than 
to the analysis of speech for the pur- 
pose of obtaining control currents 
that are measures, of its significant 
parameters. However, there have been 
developed and demonstrated at least 
three different types of speech com- 
munication systems which depend 
upon the transmission of information 
about the speech signal to control the 
synthesis of artificial speech at the 
receiving end. These are exemplified 
by the channel vocoder of Dudley 
(11), the formant tracking vocoders 
of Munson and Montgomery (44), 
Flanagan (20) and Chang (3), and the 
phonetic element transmission system 
of Dudley and Davis (14). 


The Channel Vocoder. The chan- 
nel vocoder introduced by Dudley 
(11) contains an electrical synthesiz- 
ing circuit which is essentially a voder, 
controlled by currents obtained from 
analysis of the talker’s speech. Its 
name vocoder is derived from the 
words VOice CODER. The process 
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of analyzing and then remaking the 
speech sounds is carried out auto- 
matically, with a time delay in the 
over-all process of only a few hun- 
dredths of a second. 

A simplified circuit of the channel 
vocoder is shown on Figure 8. The 
analyzer of the vocoder performs two 
functions: (1) it measures the spec- 
trum of the speech sounds; and (2) it 
measures the fundamental pitch of the 
voiced sounds. The measurement of 
the spectrum is accomplished by 
splitting the speech signal into a num- 
ber of frequency bands by means of 
filters and by detecting and smoothing 
the signals in each band to obtain a 
series of relatively slowly varying 
control currents. The number of 
band-pass filters used is generally be- 
tween 10 and 30, so that the spectrum 
of the sound is then defined by the 
relative magnitudes of the 10 to 30 
channel control currents. The process 
of speech production may be regarded 
as similar to that of a carrier system, 
in which the modulation of a vocal 
cord tone or a wide band fricative 
noise is effected by movements of the 
tongue, jaws, lips and other parts of 
the articulatory mechanism (12). 
Since the movements of these parts 
are limited to syllabic rates, and there- 
fore are relatively sluggish, the rapid- 
ity of change of the resulting speech 
sounds is generally limited. Corre- 
spondingly, the rates of change of the 
spectrum-defining control currents of 
the vocoder analyzer are relatively 
slow, and thus they can be transmitted 
over channels having narrow fre- 
quency bandwidths. Channel band- 
widths of the order of 15 to 25 cycles 
have generally been used. 

The vocoder analyzer measures the 
fundamental frequency of voiced 
sounds by selecting the fundamental 
with a low-pass filter network and by 





34 JOURNAL OF SPEECH AND HEARING RESEARCH 


using an axis-crossing counter circuit 
to obtain a direct current output 
whose magnitude is proportional to 
the fundamental frequency. This pitch 
control signal, like the spectrum-defin- 
ing signals, also changes magnitude at 
a slow rate, corresponding to the pitch 
inflections of the voice. It may also be 
transmitted over a narrow band chan- 
nel of 15 to 25 cycles width. 

The combined bandwidth required 
for the spectrum channels and the 
pitch channel is in the order of 300 
cycles, which is a reduction to one- 
tenth of the bandwidth normally used 
for a telephone channel. In terms of 
rate of transmission of information, 
measured in bits per second (53), the 
reduction is in about the same ratio. 
Whereas a channel capacity of 20,000 
to 30,000 bits per second is required 
for transmission of normal telephone 
speech with a signal-to-noise ratio 
that is acceptable, the vocoder system 
signals may be transmitted over chan- 
nels having a total capacity of less 
than 2,000 bits per second. 
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Ficure 8. Simplified diagram of the channel 
vocoder introduced by Dudley (//) in 
which speech is reduced to fluctuations of 
quantities which denote the outputs of 
band-pass filters. When these fluctuations 
are transmitted, they control similar filters 
to cause the speech to be recreated. 


The synthesizer shown on Figure 8 
is essentially the same as the voder, 
except for the substitution of controls 


operated by currents from the ana- 
lyzer in place of the manually oper- 
ated controls. The switching of en- 
ergy sources from hiss to buzz is 
accomplished by the pitch control 
signal. When there is no speech ap- 
plied to the analyzer or when the 
speech is unvoiced, there is no current 
in the pitch control channel. When 
voiced sounds are present, there is a 
pitch current whose magnitude is a 
measure of the fundamental fre- 
quency. The presence of this pitch 
current actuates the switch in the 
synthesizer, from hiss to buzz source. 
The magnitude of the pitch signal also 
controls the frequency of oscillation 
of the buzz source, so that the inflec- 
tions of the original speech are repro- 
duced in the synthesized speech. 

There is some loss of naturalness in 
speech transmitted by vocoder sys- 
tems. However, the quality of trans- 
mission is such that relatively high 
syllable articulations are obtainable 
with it. Halsey and Swaffield (28), in 
a discussion of the potentialities of 
the channel vocoder in long distance 
telephony, reported syllable articula- 
tions of 90 percent, which correspond 
to sentence intelligibilities of 99 per- 
cent. 


Formant Tracking Vocoders. It has 
been shown by a number of investi- 
gators (44, 60, 62) that intelligible 
artificial speech may be produced by 
synthesizing arrangements using two 
or more adjustable resonant circuits 
to define the speech spectrum. The 
number of controls required to pro- 
vide an acceptable synthesis with this 
kind of device is not large. Lawrence 
(39) used only six controls for a syn- 
thesizer which was basically of this 
type. Munson and Montgomery used 
eight in their resonance vocoder sys- 
tem. 
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Ficure 9. Simplified diagram of the resonance vocoder developed by Munson and Mont- 
gomery (44). Three adjustable resonating circuits are used, whose frequencies and am- 
plitudes are transmitted and control similar resonant circuits at the receiver. 


Figure 9 shows a simplified diagram 
of the resonance vocoder developed 
by Munson and Montgomery (44). 
In the analyzer the speech band was 
split into three parts: 300 to 1100, 900 
to 3000 and 3000 to 8000 cycles. In 
each band both the frequency of axis 
crossings in the wave form and the 
total energy were measured. The 
fundamental pitch of the vocal cord 
tone and the energy in the band from 
40 to 400 cycles were also measured 
by the analyzer. These eight param- 
eters were transmitted to the syn- 
thesizer which reconstructed the 
speech. The synthesizer contained 
two signal sources similar to those 
used in the channel vocoder, a cord 
tone with controlled pitch and a wide 
band noise. These two signals were 
passed through resonant networks 
having controllable peak frequencies. 
The frequency responses of the net- 


works simulated the characteristics of 
speech formants. The peak frequen- 
cies were shifted by variable induct- 
ance coils controlled by the axis-cross- 
ing counting devices in the analyzer. 
With this system, Munson and Mont- 
gomery reported vowel articulation 
scores of nearly 100 percent and con- 
sonant articulation of about 70 per- 
cent. The total bandwidth required 
for transmission of the eight control 
signals is of the same order as that of 
the channel vocoder, about 300 cycles. 

Flanagan and House (24) have re- 
ported on a design of a formant track- 
ing vocoder in which seven electrical 
signals are used to convey information 
about the frequencies of the first three 
formants, the amplitudes of the voiced 
and fricative sounds, the fundamental 
vocal frequency and the frequency 
of the spectral maximum of the frica- 
tive sounds. The seven control signals 
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occupy a total bandwidth of about 
60 cycles and in the speech trans- 
mitted over the system, the vowel 
articulation is reported to be about 80 
percent and the consonant articulation 
about 25 percent. Other investigators 
who have described experiments with 
formant tracking vocoders are Chang 
(3) and Howard (32), who used 
variable tuned filters in order to ob- 
tain a more refined measure of form- 
ant frequencies. 


Phonetic Element Transmission Sys- 
tem. A third type of transmission 
system which makes use of artificial 
speech, but which does not use con- 
tinuous parameters, is one in which 
sounds are automatically recognized 
as discrete units. A system based on 
this principle has been described by 
Dudley and Davis (14). The analyzer 
of this system operates on what are 


referred to as phonetic elements, 
which are defined as elements of the 
speech signal that have duration times 
comparable to those of phonemes, but 
which are differentiated solely by 
their spectra. A number of investi- 
gators have studied the problem of 
automatic sound recognition which is 
the function of the analyzer or send- 
ing end of this system. Smith (54), 
Davis, Biddulph and Balashek (7), 
Fry and Denes (26, 27) and Olson 
and Belar (46) have worked with a 
variety of pattern matching tech- 
niques, while Chang (3) and Wiren 
and Stubbs (65) have reported on 
techniques for binary selection of 
sound characteristics in an automatic 
recognizer. 

By analyzing speech into discrete 
phonetic elements for transmission 
over a narrow band system, a large 
saving in bandwidth is made possible, 
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Ficure 10. Schematic diagram of the phonetic element transmission system of Dudley and 
Davis (14), which recognizes what ‘phonetic element’ is being spoken and transmits a code 
to cause the same sound to be recreated at the receiver. 











as compared to channel vocoder sys- 
tems. It appears from Dudley and 
Davis’ experiments (14) that a band- 
width of the order of 75 cycles or 
less would be sufficient, of which 
about a third would be needed to 
transmit the pitch control signal and 
the remainder would be used to trans- 
mit the information as to which pho- 
netic element was being spoken. 

The circuit of the phonetic element 
transmission system of Dudley and 
Davis is shown schematically on 
Figure 10. In the form originally de- 
scribed, only 10 phonetic elements 
were identified by the analyzer. ‘Three 
or four times this many would prob- 
ably be required to provide transmis- 
sion of all the sounds essential to a 
language. The 10 phonetic elements 
used had spectra which corresponded 
with those of the sounds [s, f, i, 1, ¢, 
a, 0, u, 3, and nj. 

The operation of the analyzer is 
simply one of matching the incoming 
speech spectrum, moment by moment, 
with 10 pre-set spectrum patterns and 
automatically selecting the best match. 
This process classifies any incoming 
sound into one of 10 phonetic ele- 
ments. Information as to which of the 
10 is present is transmitted by an ‘on- 
off’ signal to the synthesizer. The 
analyzer also performs the function 
of measuring the fundamental voice 
frequency, as in the case of the chan- 
nel and formant tracking vocoders. 

The synthesizer is provided with 
two types of signals like those used 
in vocoders, the buzz or cord tone 
with controllable pitch and the wide 
band noise. These signals are applied 
to the inputs of electrical networks 
under control of the channel signals 
from the analyzer. The reconstructed 
speech is heard at the paralleled out- 
puts of the networks. 


It is of interest to note that by 
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varying the duration, some of the 
phonetic elements in this system may 
be made to serve as two different 
speech sounds. For example, the pho- 
netic element having the spectrum of 
[s] gives a good simulation of the 
sound [t] when the duration is short. 
Correspondingly, the [t] sound, as in 
the word eight, is classified by the 
analyzer as an [s] of short duration. 

Operation of this system is subject 
to the limitation that the analyzer is 
not generally applicable to all talkers’ 
voices, but may be adjusted to operate 
best on one given voice. Differences 
between men’s, women’s and chil- 
dren’s voices make it difficult, if not 
impossible, to match satisfactorily all 
the incoming speech sounds with a 
finite number of pre-set spectrum 
patterns. 

Speech transmitted by this system 
has normal pitch inflections and nor- 
mal timing of the phonetic elements, 
but does not have variations of ampli- 
tude correlated with those of the 
originzi speech. Notwithstanding, the 
talker’s individual speech characteris- 
tics may be perceived by the listener 
to a considerable degree, although 
the quality of the reconstructed 
speech is somewhat inferior to that 
attainable with the channel vocoder. 


Summary 


While attempts at producing speech 
artificially were being made nearly 
two hundred years ago, modern tech- 
nology has greatly expanded the pos- 
sibilities in this field. The most im- 
portant applications are in phonetics 
research, and in communications. This 
paper describes a number of methods 
in use at present or in the recent past, 
and indicates what kinds of results are 
being obtained. 
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Stuttering Severity During Prolonged 
Spontaneous Speech 


Clyde L. Rousey 


Until recently there has been little 
experimental work dealing primarily 
with adaptation of stuttering during 
spontaneous speech. This is not sur- 
prising when considered in terms of 
the experimental controls which can 
be effected in oral reading of prepared 
passages as contrasted with the diffi- 
culties in obtaining similar controls in 
spontaneous stuttered speech. How- 
ever, some research concerned with 
the spontaneous speech of stutterers 
has been completed. 


An early study by Harris (6) 
showed no significant transfer of the 
decreased severity observed in succes- 
sive oral readings of a passage to sub- 
sequent spontaneous speech. In a study 
of the spontaneous speech of stutter- 
ers, Cohen (4) demonstrated that 
when oral reading material was con- 
stantly changed, there was a statis- 
tically significant decrement in sever- 
ity of stuttering. This was not true of 
the subject’s spontaneous speech. 
Cohen felt a larger sample might have 
shown statistical significance. Three 
later studies by Moore (9), Newman 
(10) and Schaef (11) have confirmed 
Cohen’s latter impression. 


In the present study, further investi- 
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gation was made of possible decre- 
ment of stuttering during prolonged 
periods of spontaneous speech. The 
hypothesis evaluated was that stutter- 
ing severity would decrease during 
spontaneous speech continued over a 
period of five consecutive days with 
ten consecutive hours of spontaneous 
speech each day. 


Procedure 


Subjects. Eighteen stutterers from 
high schools in the Chicago area were 
used as the experimental group. There 
were five females and 13 males in the 
group. The ages of the subjects ranged 
from 13 years and nine months to 17 
years and seven months. Intelligence 
quotients obtained on the Henmon- 
Nelson Tests of Intelligence—Form A 
(7) ranged from 67 to 133. The mean 
intelligence quotient was 103.72 with 
a standard deviation of 4.51. No sub- 
ject having a very unsatisfactory ad- 
justment on the Bell Adjustment In- 
ventory for High School Students (3) 
was included in the experiment. 

Fifteen of the subjects were able to 
give information about the number of 
years that they had been stuttering. 
The range in time was from two to 13 
years. The mean number of years was 
8.3 and the standard deviation was 3.8 
years. Seventeen of the subjects gave 
information concerning what they be- 
lieved was the cause of stuttering and 
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what steps they have taken to alleviate 
the problem. 

The reported causes may be roughly 
classified as follows: five cases had an 
unknown cause, eight felt some form 
of an emotional problem precipitated 
stuttering, one subject felt stuttering 
was caused by talking too fast, two 
subjects blamed the problem on a 
physical illness, and one subject felt 
that stuttering was the result of a 
combination of physical and emotional 
factors. 


Five subjects had had public school 
speech therapy in grade school. Only 
one subject had begun therapy in high 
school. In four cases therapy had been 
discontinued after grade school. Pri- 
vate help was given for a short time to 
one of the subjects. Of the remaining 
six, one half had never received ther- 
apy of any kind, and the remainder 
had had therapy in grade school with 
additional private therapy. 

Data on the genetic and birth his- 
tory of the subjects were generally 
inconclusive due to parental confusion 
or forgetfulness over pertinent ages. 
Using Stander’s (12) criteria, seven 
subjects whose birth weight was over 
eight pounds could be classed as over- 
weight babies. 


Materials and Testing Procedures. 
The basic task of each subject was to 
talk continuously and spontaneously 
for ten hours a day during five con- 
secutive days. Subject headings used 
for cataloguing in high school libraries 
(5) were printed on individual slips 
of paper and randomly placed in a 
small box. The stutterers were in- 
structed to use these topics as stimuli 
for their talking. No restrictions were 
placed upon the length of time spent 
talking on any one topic or upon the 
number of topics covered. 


Four-minute tape recordings and 
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visual observations of overt stuttering 
symptoms were made each day (1) 
when the stutterer began talking, (2) 
at succeeding two-hour intervals and 
(3) at the end of the experimental 
day. A Pentron tape recorder was used 
to record the speech of each subject 
and the use of a modification of the 
Barr checklist of symptoms (2) per- 
mitted tabulation of the overt second- 
ary stuttering symptoms. A reliability 
study was conducted on the visual ob- 
servation of the number of overt 
secondary symptoms. This reliability 
was established using the formula 
Reliability = C/Y xy 
where C = the total number of 
judged secondary symptoms, x = the 
experimenter’s judged secondary 
symptoms, and y = the independent 
observer’s judged secondary symp- 
toms. These judgments made by the 
experimenter and five clinicians from 
the speech clinic of Northwestern 
University and Western Michigan 
College of Education gave reliability 
scores ranging from .81 to .87. 
Because of the length of the ex- 
periment, it was not feasible to keep 
an audience constantly before the sub- 
ject. Therefore, only the experimenter 
entered the room while the stutterer 
was speaking. By means of a loud 
speaker system, precautions were 
taken to insure that each stutterer 
felt he was constantly being monitored 
by an audience. 


Analysis of the recordings was con- 
ducted from two approaches. First, 
a seven-point rating scale was used for 
judging overall severity of stuttering 
as heard on the recorded speech 
samples. Judgments were not made by 
the experimenter but by another 
trained speech pathologist. Severity 
measures ranged from a low of one in- 
dicating little difficulty to a high of 
seven indicating extreme difficulty. 








42 JOURNAL OF SPEECH AND HEARING RESEARCH 


TasLe 1. Means of the measures of stuttering severity on an intra-day basis. Each entry is the 
mean of measures for 18 subjects. A measure of an observation period was obtained by summing 


one observation period over five days. 








Observation Periods 





1 2 4 5 6 
Number 
of words 2109.22 2346.38 2310.16 2243 .50 2396.38 2416.88 
Number of 
secondary 
symptoms 133.55 123.22 118.55 106.11 92.33 92.88 
Rated severity 18.22 16.72 16.44 16.61 16.33 16.61 








After all the ratings had been made, a 
random sample of 20 of the recordings 
already rated was prepared. These 20 
recordings were rated a second time. 
An obtained Pearson 7 of .97 measur- 
ing the relationship between the two 
sets of ratings provided evidence of 
satisfactory reliability. 

In addition to the severity ratings, 
a written transcription was made of 
the recorded samples. A count of the 
words spoken during each recording 
period was made from this transcrip- 
tion. The assumption was that verbal 
output is affected by severity of stut- 
tering. No reliability study of the 
count of words spoken was conducted 
because of the simplicity of the task. 

For each recording period, the ex- 
perimenter was thus able to express 
quantitatively three indices of stutter- 


ing severity: (1) rated severity of the 
stuttering speech, (2) the number of 
observed visible secondary symptoms, 
and (3) the number of words spoken. 
If a decrement in stuttering severity 
occurred during spontaneous speech, 
this should be reflected in each of 
these three measures. That is, there 
should be a decrease in observed 
symptoms, a decrease in the rated 
severity of stuttering and an increase 
in the verbal output. It was possible 
to evaluate the data on an intra-day 
basis, and by summing the scores for 
each day, to evaluate the data on an 
inter-day basis. 


Results 


Intra-day Stuttering Severity. Mean 
scores for each observation period of 


TaBLE 2. Trend analysis of the number of words spoken on an intra-day basis. 











Source ss df ms F P 
Ve (individual deviations 
from estimation) 28 ,062.78 62* 452.63 
Vgdfl (group deviation 
from linearity) 6,722.29 4 1,680.57 Saye! <.01 
Vis (individual slopes) 4,921.92 17 289.52 64 >.05 
Vgs (group slope) 5,028.01 1 5,028.01 11.10 <.01 








*6 df were lost because of approximation made of information lost through mechanical failure (11). 
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TasLeE 3. Trend analysis of the number of secondary symptoms on an intra-day basis. 











Source ss df ms F P 
Ve (individual deviations 

from estimation) 22 , 939.85 68 337.35 
Vgdfl (group deviation : 

from linearity) 1,015.05 + 253.76 .75 >.05 
Vis (individual slopes) 24,404.05 1 24,404.05 72.52 <.01 
Vgs (group slope) 17,240.45 17 1,014.14 3.01 < Of 








stuttering severity on an intra-day 
basis are shown in Table 1. Two hours 
elapsed between each two periods. It 
is evident that the number of words 
spoken increased on the average from 
the beginning to the end of a day of 
speaking. There is a sudden increase 
in words spoken which occurred 
within the first two hours. Not until 
the last two hours before the day was 
over is a comparable gain seen. The 
data were evaluated by the method 
of trend analysis described by Alex- 
ander (1). This allowed a study of 
both the individual and group reac- 
tions. Results of this statistical analysis 
are found in Table 2. The test of the 
group reaction to the experimental 
variable is statistically significant be- 
yond the one per cent level. The pat- 
tern of reaction was not rectilinear as 
shown by the significant F-ratio in the 
test of the group deviation from 


linearity. That the reaction is typical 
of the group as a whole is shown by 
the failure of the test of individual 
variance to be significant. Initial con- 
firmation of the hypothesis under 
study is provided by these results. 
The frequency of secondary symp- 
toms occurring throughout the day 
shows a continued decrease until the 
last observation of the day. As can be 
seen from Table 1, there is again a 
marked change following the first two 
hours of speaking. Following the ini- 
tial drop in frequency, there is rela- 
tively little change between periods 
two and three, after which another 
sharp decrement occurs. The decre- 
ment in secondary symptoms, as in- 
dicated in Table 3, is significant be- 
yond the one per cent level. The 
statistical analysis also indicates signi- 
ficant individual variation. In fact, at 
the end of the experiment three stut- 


Taste 4. Trend analysis of rated severity of stuttering on an intra-day basis. 











Source ss df ms F P 
Ve (individual deviations 

from estimation) 161.2 62* 2.61 
Vgdfl (group deviation 

from linearity) 22.9 4 5.73 2.20 >.05 
Vis (individual slopes) 44.0 17 2.59 1.00 >.05 
Vgs (group slope) 21.1 1 21.10 8.12 <.01 








*6 df were lost because of approximation made of information lost through mechanical failure, 





44 





JOURNAL OF SPEECH AND HEARING RESEARCH 


Tas.eE 5. Inter-correlations between the rating of severity, the number of secondary symptoms 











and the number of words spoken on an intra-day basis. 
Rating of Severity Number of Words Rating of Severity 
Period and Number Spoken and and 
of Words Spoken Secondary Symptoms Secondary Symptoms 
1 -.70 -.66 +.89 
2 -.77 ~.75 + .90 
3 -.69 -.70 + .92 
4 -.72 ~.74 +.94 
5 -.75 -.76 + .93 
6 -.72 -.74 + .90 








terers were noted to have more 
secondary symptoms than when they 
began. 

In the final intra-day measure, the 
auditory rating of stuttering severity, 
Table 1 suggests a pattern of behavior 
different from that indicated by the 
previous two measures. While there 
is the initial drop in severity during 
the early hours of the experiment 
which is similar to the previous pat- 
terns, a plateau-like effect after two 
hours of spontaneous speech appears 
fairly constant. Despite this effect, the 
change noted yields an F-ratio signifi- 
cant beyond the one per cent level. 
Table 4 presents the complete results 
of the statistical analysis. 

The trends of severity which are 
apparent on all three measures suggest 
some degree of relationship among 
these methods of measuring severity. 
Relationships were evaluated by Pear- 
son 7’s computed for each of the six 


periods. The obtained correlations, 
reported in Table 5, are all significant. 
It is evident that a higher relationship 
exists between the two measures ob- 
tained by rating severity and by 
counting secondary symptoms than 
that between either of these two 
measures and verbal output. The 7’s 
for the first mentioned comparison 
ranged from .89 to .94 and for the 
other two comparisons from —.66 to 
—.77. 


In summary, results of the analysis 
of the three measures of stuttering 
severity on an intra-day basis confirm 
the hypothesis that a decrement in 
stuttering severity occurs during pro- 
longed spontaneous speech. Further, 
this decrement can be said to be 
characteristic of the group as a whole 
with significant individual differences 
occurring only in the trend of severity 
as measured by the number of second- 
ary symptoms. 


TasLE 6. Means of the measures of stuttering severity on an inter-day basis. Each entry is the 


mean of measures for 18 subjects. A measure for a 
tions made during any given day. 


day was obtained by summing all six observa- 











Days 
1 2 8 4 5 
Number of words 2488 .94 2662.27 2855.50 2860.05 2955.79 
Number of secondary symptoms 183.89 148.06 128.67 103.83 101.89 
Rated severity 23.17 19.50 19.83 19.72 18.72 
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TaBLE 7. Trend analysis of the number of words spoken on an inter-day basis. 











Source ss df ms F P 
Ve (individual deviations 

from estimation) 23 ,982.8 45* 532.95 
Vgdfl (group deviation 

from linearity) 675.6 3 225.20 42 >.05 
Vis (individual slopes) 18,579.0 17 1,092.88 2.05 <.05 
Vgs (group slopes) 24,827.8 1 24,827.80 46.59 <.01 








*6 df were lost because of approximation made of information lost through mechanical failure. 


Inter-day Stuttering Severity. The 
data were also evaluated with respect 
to inter-day changes. Scores for every 
observation or rating made each day 
were summed to give the daily score. 
The summary for all three measures 
appears in Table 6. 

When the number of words spoken 
is considered, a marked change is seen 
from the first through the fifth day. 
The trend is similar in some respects 
to that observed on an intra-day basis 
with the same measure. There is a 
sharp initial increase in word fre- 
quency and a second less marked rise 
between days four and five. In the test 
of group trend, the obtained F-ratio 
reported in Table 7 is significant be- 
yond the one per cent level. Individual 
trends within this group pattern were 
significantly different beyond the five 
per cent level. 


When the criterion of stuttering 
severity was the number of secondary 
symptoms, there was again a notice- 
able change over the five-day period. 
The number drops rapidly from the 
first through the fourth experimental 
day. Although it continues to decline 
on the final day, the decrement is 
slight. It is difficult to judge the extent 
to which this abrupt change in trend 
is a function of the actual adaptation 
phenomenon, and the extent to which 
it is dependent on the stutterer’s 
knowledge that he is concluding his 
talking. The change in trend at day 
four is abrupt and the statistical anal- 
ysis, reported in Table 8, gives evi- 
dence of a departure from linearity. 
This change does not follow the same 
inter-day pattern found for verbal 
output. The highly significant F ob- 
tained for the group effect confirms 


TasBLE 8. Trend analysis of the number of secondary symptoms on an inter-day basis. 











Source ss df ms F P 
Ve (individual deviations 

from estimation) 29 ,034.9 51 569.31 
Vgdfi (group deviation 

from linearity) 5,718.9 3 1,906.30 3.35 < .05 
Vis (individual slopes) 50,069.7 17 2,945.28 51.73 <.01 
Vgs (group slopes) 78 ,041.7 1 78,041.70 137.08 <.01 
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TABLE 9. 


Trend analysis of rated severity of stuttering on an inter-day basis. 











Source 8s df ms F Ie 
Ve (individual deviations 

from estimation) 378.7 45* 84.16 
Vgdfl (group deviation 

from linearity) 77.9 3 25.97 31 >.05 
Vis (individual slopes) 275.0 17 16.18 19 >.05 
Vgs (group slopes) 135.2 1 135.20 1.61 >.05 








*6 df were lost because of approximation made of information lost through mechanical failure. 


the reliability of the trend. Again the 
test of individual slopes reveals differ- 
ences significant at the five per cent 
level. 

As with the other two inter-day 
measures, rated stuttering severity 
shows a decrement from the first 
through the fifth day. However, the 
trend is not of statistical significance. 
This can be seen from the F-ratios re- 
ported in Table 9. Examination of the 
pattern presented in the daily scores 
shows an early decrement between 
day one and day two followed by a 
plateau, and then a continued decre- 
ment through day five. 

A study of the inter-relationships 
between the three measures was also 
made. The highest Pearson correla- 
tions occur between the measures of 
rated stuttering severity and of the 
number of secondary symptoms ob- 
served. The results of the correlation 
study on an inter-day basis are similar 


to those found on intra-day measures. 
The obtained correlations are shown 
in Table 10. Pearson 7’s measuring 
the relationships between the number 
of moments of stuttering and the totals 
of each of the three measures were 
computed. The highest r of +.62 was 
obtained for the relationship with the 
number of secondary symptoms. The 
7’s measuring the relationships between 
number of moments of stuttering and 
(1) rated severity and (2) number of 
words were .51 and —.57, respect- 
ively. 


Discussion 


The obtained results apparently sup- 
port the hypothesis under study. A 
significant decrement in the severity 
of stuttering during spontaneous 
speech occurred on the average on 
all three measures both on an intra- 


TaBLE 10. Inter-correlations between the severity rating scale, the number of secondary symp- 
toms, and the number of words spoken on an inter-day basis. 











Rating of Severity Number of Words Rating of Severity 
Day and Number onan and and 
of Words Spoken Se ry Symptoms Secondary Symptoms 
1 -.69 -.65 + .78 
2 -.67 -.69 + .83 
3 -.78 -.75 + .94 
4 -.72 -.68 + .88 
5 -.68 -.69 +.91 




















day and inter-day basis with one ex- 
ception. The rated severity of stutter- 
ing on an intra-day basis failed to 
show this significant decrement. None 
of the stutterers completely stopped 
observable stuttering as a result of the 
experimental technique. In fact, two 
of the group demonstrated mure stut- 
tering than when they began. There 
exists a need for study of the diagnos- 
tic criteria or other information which 
would differentiate those who make 
and those who will not make progress. 
Additional observations of the speech 
behavior of these subjects after the 
completion of the experiment were 
not made. It is not possible to say 
whether any permanent change was 
effected. 

Possibly the early decrement which 
occurred between period 1 and period 
2 of each day as well as between day 
1 and day 2 represented only the stut- 
terer’s acclimation to the experimen- 
tal task. Further, the use of the same 
tape recorder and observer may have 
facilitated the adaption noted. Addi- 
tional research utilizing a changing 
audience might further clarify the na- 
ture of this decrement. 


Summary 


Eighteen adolescent stutterers talked 
ten hours a day for five consecutive 
days. The severity of their stuttering 
was measured at regular intervals by 
using a subjective rating of severity, a 
count of the number of visible second- 
ary symptoms, and a count of the 
number of words spoken per observa- 
tion. A statistically significant decre- 
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ment in stuttering was generally 
demonstrated. 
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Speaking Time Behavior Of The Stutterer 
Before And After Speech Therapy 


William D. Trotter 


Louisa Brown 


The importance of speaking time be- 
havior with respect to stuttering has 
been pointed out by Johnson (J): 


Speaking time is a fundamental in- 
dicator of the degree to which the per- 
son is handicapped by his stuttering. A 
stutterer who speaks less than two min- 
utes per day, for example—and there 
are actually some who do no more 
speaking than this—is allowing his 
speech difficulty to affect him more than 
one who speaks a half hour or so per 
day. (Incidentally, probably most non- 
stuttering adults average not over 30 
to 45 minutes of speaking daily.) The 
one who speaks two minutes or less 
per day is probably far more handi- 
capped, even though the other stutterer 
who talks 30 minutes or more may have 
much more trouble in terms of percent- 
age of words stuttered or degree of mus- 
cular tension and strain. After all, the 
importance of a particular individual’s 
stuttering is felt by him in a very basic 
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Schools. This article is based on a paper 
given at the 1956 convention of the Ameri- 
can Speech and Hearing Association. The 
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University of Iowa in a program of re- 
search in stuttering therapy directed by 
Wendell Johnson under a grant from the 
Louis W. and Maud Hill Family Founda- 
tion. 
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way in the extent to which it inhibits 
his communication with other people. 


If, as Johnson hypothesizes, speak- 
ing time is ‘a fundamental indicator of 
the degree to which the person is 
handicapped by his stuttering,’ then 
it would seem important to know 
whether there are any changes in the 
stutterer’s speaking time foliowing a 
period of speech therapy. 


Procedure 


The subjects were 15 adult stut- 
terers (12 male and three female), 
ranging in age from 17 to 26 years. 
Thirteen were full-time University of 
Iowa students, one was a housewife 
and one was a high school student. 

The subjects were asked at the be- 
ginning and end of a three month 
therapy period to keep a speaking time 
record for three days according to a 
procedure described by Johnson (1). 
This procedure consists essentially of 
having the stutterer record in a small 
pocket notebook the names of the 
people to whom he speaks and the 
amount of time in seconds that he 
spends speaking to them. The main 
caution to be observed is that the per- 
son does not overestimate his speaking 
time. At the end of the day he totals 
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TaBLE 1. Average daily speaking time of 15 stutterers before and after a period of therapy. 











Average Daily Speaking Average Daily Speaking 





Stutterer Age Time Before Therapy Time After Therapy 
in Minutes in Minutes 
1 22 48.42 89.58 
2 24 20.47 39.36 
3 25 36.30 86.11 
4 22 17.51 32.43 
5 24 34.69 54.83 
6 20 50.26 39.56 
ij 21 6.88 33.50 
8 17 20.00 46.89 
9 18 10.00 24.67 
10 24 7.87 14.94 
11 20 48 .34 79.06 
12 19 (i) 10.03 
13 19 6.08 9.99 
14 19 28.00 27 .67 
15 26 61.43 77.77 
26.84 





up the time, divides by 60, and this 
is his daily speaking time in minutes. 
For the purpose of this experiment the 
stutterers were asked to break down 
speaking time into eight categories. 
These categories were public speaking 
situations, telephone calls to strangers, 
classroom speaking situations, tele- 
phone calls to friends, talking with 
strangers, talking with family, group 
speaking and talking with friends. 
The general plan of therapy fol- 
lowed by the adult stutterer at the 
University of Iowa Speech Clinic has 
been described in several publications 
(2, 3, 4, 6). Perhaps the most detailed 
account of the therapy received by 
the stutterers of this experiment is 
given by Johnson and Trotter (2). 


Results 
Table 1 shows the average daily 


speaking time for three days before 
and after a period of therapy. An 


examination of this table shows that 
13 of the 15 stutterers showed an in- 
crease in speaking time following 
therapy. The mean speaking time of 
the group before therapy was 26.8 
minutes and after therapy was 44.2 
minutes. A t-test was used to evaluate 
the significance of the difference be- 
tween the two means. The value of t 
is 15.9 and is significant beyond the 
one percent level. 


Johnson (1) has estimated that the 
average non-stutterer speaks between 
30 minutes and 45 minutes a day. 
Table 1 shows that before therapy 
only 40 percent of the stutterers of 
this experiment spoke more than 30 
minutes, whereas after therapy ap- 
proximately 73 percent spoke more 
than 30 minutes. It is interesting to 
note in this respect that Trotter (5), 
in a study of the speaking time be- 
havior of the non-stuttering college 
student, found that 75 percent of his 
subjects spoke more than 30 minutes 


a day. 
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TasLe 2. Average daily speaking time (in minutes) in eight types of speaking situations of 15 


stutterers before and after therapy. 














Mean Range Mean Range Number Number Number 
Situation Speaking Speaking with with with 
Time Time Increased Decreased No Speak- 
Before After Speaking Speaking ing Time 
Therapy Therapy Time After Time After Both Before 
: Therapy Therapy and Afier 
a Therapy 
Public 
Speaking 21 0-3 .33 2.04 0-28.33 4 1 10 
Telephone 
Calls to 
Strangers .20 0.94 75 0-4.22 10 2 3 
Classroom .57 0-1.92 93 0-6 .66 5 7 3 
Telephone 
Calls to 
Friends 1.16 0-2.33 3.19 0~-9.00 13 1 1 
Talking 
with 
Strangers 2.67 1-11.53 3.32 .18-12.77 7 8 0 
Talking 
with 
Family 4.43 0-41.67 5.73 0-60.00 5 0 10 
Group 
Speaking 3.56 0-12.28 10.05 0-38.56 12 2 1 
Talking 
with 
Friends 14.45 3.33-42.55 18.41 5.88-36.56 11 4 0 








Table 2 shows the mean speaking 
time of the stutterers in eight types 
of speaking situations. The ranges of 
the speaking times for each of the 
eight situations are given in this table 
together with the number of stutterers 
who, after therapy, showed an in- 
crease in speaking time, a decrease in 
speaking time or who had zero speak- 
ing time both before and after ther- 
apy. An examination of this table 
shows that in the situations talking 
with friends, group speaking, tele- 
phone calls to friends and telephone 
calls to strangers at least two-thirds of 
the stutterers showed an increase in 
speaking time. In the remaining four 
situations of public speaking, class- 
room, talking with strangers, and talk- 
ing with family less than half showed 
an increase. 


As already noted, the average speak- 
ing time of the stutterer changed after 
therapy from 26.8 to 44.4 minutes, an 
increase of 17.6 minutes. Table 2 
shows that approximately 10.5 minutes 
or 59 percent of the increase occurred 
in two situations: talking with friends 
and group speaking. 


Discussion 


If Johnson’s (1) hypothesis that 
speaking time is a fundamental indi- 
cator of the degree to which stutterers 
are handicapped by their stuttering is 
a valid one, then the results of this 
study indicate that in terms of his 
criteria the stutterers are less handi- 
capped by their stuttering at the end 
of a three-month therapy period than 
at the beginning. 
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There are at least three explanations 
for the increase in speaking time. First, 
because of a belief in its beneficial 
therapeutic effects, some attempt is 
made to persuade the stutterers at- 
tending the University of Iowa speech 
clinic to increase their speaking time. 
The increase noted in this experiment 
may be due to this persuasion. If this 
were the reason, then it would be ex- 
pected that when the pressure to speak 
more was removed—as it would be 
when the stutterer left the clinic—that 
the speaking time would lapse back to 
about what it was when the stutterer 
arrived at the clinic. 


Second, since one of the chief aims 
of therapy is to reduce the strength of 
the stutterer’s fear regarding his stut- 
tering, it may be that the increase in 
speaking time was the result of this 
fear reduction rather than of any pres- 
sure exerted to increase the speaking 
time. 


Third, there may be a tendency for 
people in general to increase their 
speaking time as they become more 
familiar with more people in a new 
environment. When the stutterers ar- 
rived at the Iowa University Clinic, 
they were in a comparatively strange 
environment and knew very few 
people. The longer they stayed, the 
more people they probably became 
acquainted with, thus increasing their 
opportunities for speaking. The stut- 
terers were given three weeks to ad- 
just to their surroundings before col- 
lecting the speaking time records, but 
this amount of time may not have 
been long enough. 


Summary 

The purpose of this experiment was 
to determine whether stutterers are 
less handicapped in terms of the cri- 
terion of speaking time at the end of a 
period of speech therapy than at the 
beginning. 

Fifteen stutterers were asked to 
keep speaking time records for three 
days both at the beginning and at 
the end of a three-month period of 
therapy. The average speaking time 
at the beginning was compared with 
that at the end of therapy. 

Thirteen of the 15 stutterers showed 
an increase in speaking time at the 
end of therapy. The average increase 
in speaking time from 26.8 to 44.2 
minutes was significant at the one 
percent level. 
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Switching Transients And 
Threshold Determination 


Herbert N. Wright 


When pure tones are turned on or off 
without the appreciable rise-decay 
time incorporated in audiometers, 
spurious transient energy is introduced 
which complicates the total stimulus 
pattern. These switching transients 
are easily heard as prominent clicks 
at supra-threshold levels both before 
and after the tone. They could re- 
sult in an artificial improvement of 
measured auditory acuity for pure 
tones even though the transient en- 
ergy is of less magnitude than the pure 
tone itself. The transient energy gen- 
erated by abrupt switching could ex- 
tend into a region where the ear is 
more sensitive and consequently effect 
an artificial and spurious threshold re- 
sponse. For example, the switching 
transients present in an 8000 cps tone 
with an instantaneous rise and decay 
may affect the measured threshold 
because of the slope of the threshold 
curve. The transients generated by 
abrupt switching would extend into 
the lower frequency region where the 
ear is more acute. A comparable con- 
dition is also apparent at 125 cps where 
the energy spread to higher frequen- 
cies is of concern. 
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A pure tone stimulus confounded 
by switching transients at these ex- 
treme frequencies provides the max- 
imal condition for the artificial im- 
provement of threshold. The minimal 
effect would be between 1000 and 
3000 cps where the normal ear is most 
sensitive. Similarly, these switching 
transients can be expected to have 
greater influence on the measured 
threshold of individuals with precipi- 
tous hearing losses. Here the slope of 
the sensitivity curve of the impaired 
ear is greater than that of the normal 
ear. Consequently, the transient en- 
ergy generated by abrupt switching 
more closely approximates the indi- 
vidual threshold contour. 


The primary purpose of this in- 
vestigation was to determine the in- 
fluence of these switching transients 
on the thresholds for pure tones in 
both normal and impaired ears. A 
second aim was to describe the limit- 
ing conditions for this influence on 
pure tone thresholds. This was ap- 
proached by obtaining threshold at 
nine frequencies and at five different 
rise-decay times on individuals with 
normal hearing and on_ individuals 
with impaired hearing. 


Apparatus 


Figure 1 shows a simplified block 
diagram of the equipment. The signal 
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Ficure 1. Simplified block diagram of the 
apparatus. 


source was an audio-oscillator (Gen- 
eral Radio, type 1304A) whose out- 
put passed through an electronic 
switch (Grason-Stadler, type 829-S) 
controlled by an electronic timer 
(Grason-Stadler, type 470). 

From the electronic switch, the 
signal passed through two attenuators. 
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Ficure 2. Fast rise-decay times. 
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The first was controlled by the ex- 
perimenter and had a range of 110 db 
in one db steps. The second was con- 
trolled by the subject and had a range 
of 30 db in one db steps. From this 
second attenuator, the signal passed 
directly to a PDR-8 earphone mounted 
in a doughnut cushion on a Navy 
aviation type headset. 


A counter-timer (Berkeley, type 
5500) was initially used to calibrate 
the duration of the signal to 500 milli- 
seconds with a 50 per cent duty cycle. 
The cycles rise and starting phase of 
the switched signals were monitored 
with an oscilloscope (Dumont, type 
304-A). The electronic switch and 
timer combination was adjusted at 
each frequency tested to provide a 
train of repeated short tones which 
always began at the time axis. 


Figure 2 shows the ‘fast’ rise-decay 
times used at each frequency. These 
times were 14, cycle, 114, cycles, 244 
cycles and 514, cycles. The non-tran- 
sient confounded stimulus iad a rise- 
decay of approximately ‘90 milli- 
seconds. This was obtained by adjust- 
ing a 50-cycle tone to a rise-decay of 
5%, cycles and then changing the 
oscillator to the test frequency. 


Procedure 


Two groups of subjects were tested. 
Five individuals with normal hearing 
made up the first or control group. 
The second or experimental group 
consisted of six individuals with vary- 
ing degrees of hearing loss of the type 
induced by prolonged noise exposure. 
No normal hearing subject of the con- 
trol group had a loss greater than five 
decibels at any of the test frequencies. 
All subjects were experienced listeners 
in auditory experiments. 


All subjects obtained their thresh- 
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Ficure 3. Threshold improvement in db at each fast rise-decay time for hearing-impaired 


Subject A. 


olds at each test frequency five times 
for each rise-decay time. Each sub- 
ject was instructed to adjust his at- 
tenuator to the lowest point at which 
he could hear anything. Adherence to 
this response definition obviated the 
confounding aspect of tonality. The 
experimenter’s attenuator was arbi- 
trarily changed between adjacent 
threshold adjustments to guard against 
any systematic biasing of results from 
a particular attenuator position. The 
mean attenuation of these five thresh- 
old adjustment trials defined each sub- 
ject’s threshold at each frequency for 
each of the five rise-decay times. 

The frequencies 125, 250, 500, 1000, 
2000, 3000, 4000, 6000 and 8000 cps 
were investigated at each rise-decay 
time. The order of presentation of the 


test frequencies was random for each 
subject as were the rise-decay times 
following the selection of a test fre- 
quency. 

Each subject was seen in 10 ex- 
perimental sessions. In the first session 
a standard clinical audiometric ex- 
amination was administered. These 
test results were used to define the 
experimental and control groups. In 
each of the remaining nine sessions, 
each subject obtained his thresholds 
at one frequency for each of the five 
rise-decay times. At least one hour 
intervened between the end of one ex- 
perimental session and the beginning 
of the next. 

Following each threshold adjust- 
ment, the total attenuation for thresh- 
old was determined by simply adding 
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Figure 4. Threshold improvement in db at each fast rise-decay time for hearing-impaired 


Subject B. 


the amount of attenuation in the sub- 
ject’s and the experimenter’s attenua- 
tors. The experimenter then put maxi- 
mum attenuation in his 110 db 
attenuator and the subject returned his 
30 db attenuator to 10. Following this, 
the experimenter adjusted his attenua- 
tor to a value different from his pre- 
vious setting. The subject then made 
another threshold adjustment. There 
was a slight rest while the rise-decay 
time of the signal was changed be- 
tween adjacent blocks of five thresh- 
old adjustments. Each subject was en- 
couraged to take as much time as he 
felt necessary for his threshold deter- 
minations. 


Results 


The total attenuation for threshold 
at each of the nine frequencies for 
each rise-decay time was used in com- 
paring the experimental and control 
groups. 

The audiogram of each subject in 
the hearing loss group was determined 
with tones having a 100-millisecond 
rise-decay time. These audiograms are 
shown in Figures 3 through 8. The 
zero reference level for these audio- 
grams was the median attenuation of 
the normal hearing group to obtain 
threshold for the train of 500-milli- 
second tones with a rise-decay of 100 
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TaBLe 1. Medians and ranges in db of threshold improvement for the normal hearing group (N = 5). 


Medians are given in the first row and ranges in the second row for each rise-decay time. 

















Rise-Decay Test Frequency 
Time 125 250 500 1000 2000 3000 4000 6000 8000 
Yy 4 6 4 6 4 6 -.6 4 1.0 
cycle 3.2 1.6 1.2 6.2 4.0 5.4 6.6 5.2 7.6 
1% 8 -.2 8 4 6 1.2 1.2 .2 2 
cycles 1.6 4.0 1.0 6.2 3.8 5.6 6.6 4.0 3.4 
214 1.0 2 A -.4 2 2.0 0.0 1.2 A 
cycles 2.2 5.6 3.6 2.8 3.8 5.6 4.0 5.0 7.0 
54 2 4 1.0 6 6 2 0.0 4 0.0 
cycles 3.2 1.0 2.6 4.0 4.6 5.6 9.4 7.4 6.8 
milliseconds. differences between his threshold for 


The measures of threshold im- tones with a 100-millisecond rise- 


provement for each subject were the 
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Ficure 5. Threshold improvement in db at each fast rise-decay time for hearing-impaired 
Subject i. 
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Figure 6. Threshold improvement in db at each fast rise-decay time for hearing-impaired 


Subject D. 


with the four ‘fast’ rise-decay times. 
The median and range of threshold 
improvement for the control group 
for each combination of frequency 
and rise-decay time may be found in 
Table 1. These results show that the 
median threshold improvement caused 
by the switching transients for the five 
subjects with normal hearing was 
negligible and had no systematic rela- 
tion to the amount of transient en- 
ergy present. Inspection of the range 
of results shows that the responses of 
the subjects constituting the normal 
hearing or control group were not ex- 
tremely variable. The results, then, 
may be interpreted to mean that these 
switching transients have no appreci- 


able effect upon threshold determina- 
tion in the normal ear. 

Figures 3 through 8 show the 
threshold improvement at each fre- 
quency and rise-decay time for each 
of the six subjects of the experimental 
group with varying amounts and de- 
grees of hearing loss as well as their 
associated audiograms. These results 
indicate that the influence of switch- 
ing transients is uniquely associated 
with the impaired ear under considera- 
tion. The amount of threshold im- 
provement caused by the switching 
transients appears to be related to the 
contour of the hearing loss and the 
degree of hearing loss as well as to 
the amount of transient energy pres- 
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Ficure 7. Threshold improvement in db at each fast rise-decay time for hearing-impaired 


Subject E. 


ent in the abruptly switched signal. 
Each of these factors will now be 
considered in greater detail as they are 
exemplified by the hearing impaired 
group. 

First, four of the six subjects con- 
stituting the experimental group (Fig- 
ures 3, 4, 5, and 6) showed sizeable 
threshold changes between 2000 and 
8000 cps. It is here that these impaired 
ears were more readily stimulated by 
the spread of transient energy gen- 
erated by abrupt switching. The great- 
est effect of the switching transients, 
when it occurred, is seen to approxi- 
mate the point of greatest hearing loss 
for each of these hearing impaired 
subjects. Moreover, it is notable that 


no subject in the hearing impaired 
group showed appreciable shifts be- 
tween 125 and 1000 cps where his 
hearing was comparable to that of the 
normal hearing or control group. 
Second, the influence of switching 
transients is seen to decrease sys- 
tematically as the rise-decay time is 
made longer (Figures 3, 4, 5, and 6). 
The switching transients present in 
those pure tones with a rise-decay of 
514, cycles did not appreciably affect 
the measured threshold in any one of 
the six hard-of-hearing subjects. 
Finally, the amount of hearing loss 
also appears to be a contributing fac- 
tor. Iwo of the hearing impaired sub- 
jects (Figures 7 and 8) did not show 
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Ficure 8. Threshold improvement in db at each fast rise-decay time for hearing-impaired 


Subject F. 


the systematic effects shown by the 
other four. Although the hearing loss 
contour of these two subjects was 
comparable to the other four in the 
experimental group, the maximum 
hearing loss was less. In this investiga- 
tion the switching transients did not 
systematically affect threshold deter- 
mination until the maximum hearing 
loss was greater than 40 db. 


Discussion 


The resuits summarized above show 
that, although the switching transients 
present in pure tone signals with rapid 
rise-decay times are quite apparent at 
supra-threshold levels, the degree to 


which they affect the measurement of 
threshold can be relatively slight. On 
the basis of the present limited sample, 
it would appear that the transients 
present in a train of repeated short 
tones with a 14-cycle rise-decay do 
not significantly affect the measure- 
ment of threshold in the normal ear. 
In the case of impaired ears, the re- 
sults suggest a similar conclusion for 
a rise-decay of at least 514, cycles. The 
threshold improvement effected by 
switching transients in a train of re- 
peated pure tone stimuli was found to 
be related to the hearing loss contour, 
the degree of hearing loss and the 
relative amount of transient energy 
present. Additional study of the rela- 
tive importance of these variables af- 
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fecting the measured threshold is in- 
dicated before further generalization is 
warranted. It should be emphasized 
that these present results were ob- 
tained on a limited number of ex- 
perienced subjects and apply to thresh- 
old measurement only where the pure 
tone begins at the time axis. In addi- 
tion, they do not imply that the 
observed effects of switching tran- 
sients apply to loudness measures. 


Summary 


Thresholds for 500-millisecond 
tones beginning at the time axis and 
rising to maximum amplitude in 14 
cycle, 114 cycles, 214 cycles, 51%4 
cycles and 100 milliseconds were de- 


termined at 125, 250, 500, 1000, 2000, 
3000, 4000, 6000 and 8000 cps by the 
method of adjustment in both normal 
and impaired ears. Results indicate that 
the switching transients have no sig- 
nificant influence upon threshold de- 
termination in normal ears. In impaired 
ears, the influence of switching tran- 
sients on threshold determination was 
found to be related to the contour of 
the hearing loss, the degree of hear- 
ing loss and the relative amount of 
transient energy present. In general, 
the switching transients were found 
to have a decreasing influence as the 
rise-decay time was made longer and 
no significant influence on the thresh- 
old for impaired ears when the pure 
tone sees had a rise-decay of at 
least 514, cycles. 





Individual Ratings Of Severity 
Of Moments Of Stuttering 


Dorothy Sherman 
Richard McDermott 


The psychological scaling method of 
equal-appearing intervals is useful for 
obtaining measures of the severity of 
stuttering for short speech samples 
(2, 3). The problem of observer time, 
however, may often preclude the use 
of the method if a group of observers 
is required. That the responses of a 
single observer can yield reliable scale 
values of severity for samples of 
speech several minutes long has been 
established (4). 


Experimental study of stuttering be- 
havior sometimes requires an index 
of the severity of the audible charac- 
teristics of individual moments of 
stuttering. As previously mentioned, 
obtaining scale values from the re- 
sponses of a group of observers may 
be impractical. A more efficient meth- 
odology for obtaining reliable scale 
values of severity of individual mo- 
ments of stuttering is needed. The 
purpose of this study was to examine 
the reliability of mean scale values of 
severity of individual moments of 
stuttering obtained from the responses 
of a single observer. 





Dorothy Sherman (Ph.D., Iowa, 1951) 
is Associate Professor of Speech Pathology 
and Audiology, and Richard McDermott 
(M.A., Ohio State University, 1955) is Au- 
diology Fellow, State University of Iowa. 
This article reports an experiment com- 
pleted at The Ohio State University. 


Volume 1, No. 1 


| 


Procedure 


Experimental Material. Tape-re- 
corded three-minute readings of prose 
material by 22 adult stutterers were 
available from a previous experiment 
(4). These recordings provided 20 
experimental speech samples for rating 
of individual moments of stuttering 
and two speech samples for practice 
of the experimental task. 

Also available from a previous ex- 
periment (5) was a recorded training 
tape constructed for the purpose of 
training observers to rate the severity 
of audible characteristics of individual 
moments of stuttering. The items in- 
cluded in the training tape were se- 
lected from five repeated readings of 
a 500-word passage of prose material 
by each of 20 stutterers, 100 readings 
in all (7). The individual moments of 
stuttering in these 100 recorded read- 
ings had been judged with respect to 
severity on a one- to nine-point scale 
by 11 sophisticated, trained observers. 
A total of 45 individual moments of 
stuttering, five at each of the nine 
levels of severity, were selected. In 
no case did the range of the ratings 
made by the 11 observers on the se- 
lected moments of stuttering exceed 
three scale units. The mean scale value 
derived from the 11 ratings was in 
each instance within plus or minus .18 
of the level of severity it was chosen 
to represent. 
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The 45 sentences within which the 
selected individual moments of stut- 
tering occurred were arranged in five 
sets of nine sentences each. In each 
of the five sets of sentences the se- 
lected individual moments of stutter- 
ing occurred in order from the least 
to the most severe with one selected 
moment at each of the nine levels of 
severity. 


Written Transcription of Recorded 
Samples. A copy of the material read 
by each stutterer was modified to in- 
clude the reading errors and any 
deviations from the written text. The 
rater was thus provided with accurate 
transcriptions of the material heard 
from the recorded samples. Repeti- 
tions and extraneous words which 
appeared to be associated with stutter- 
ing moments were omitted. 


Identification of Stuttered Words. 
The 22 three-minute samples were 
played to two judges who independ- 
ently marked what they believed to 
be the stuttered words. Each judge 
had before him a transcription of the 
material that each stutterer had read. 
This procedure was repeated a second 
time by the same two judges. Any 
word that had been marked at least 
twice out of the possible four times 
was regarded for the purposes of this 
experiment as a stuttered word. That 
is, any word marked once or twice 
by both judges, or twice by only one 
judge, was included. 


Selection of Stuttered Words to Be 
Rated. Twenty stuttered words in 
each of the 20 experimental samples 
and the two practice samples were 
selected at random as those to be rated 
by the observers. One restriction was 
placed upon the random selection; no 
word was selected adjacent to another 


word previously chosen. One excep- 
tion to this restriction was made for 
a sample only 39 words long. 


Preparation of Practice Tapes. Two 
sets of nine sentences each from the 
recorded training tape with the se- 
lected individual moments of stut- 
tering arranged in each set from the 
least to the most severe were used to 
familiarize the observers with the 
scale. The remaining 27 sentences 
were randomized with respect to 
severity of the selected individual 
moments of stuttering. The purpose of 
the randomization was to provide 
practice on rating individual moments 
with previously established levels of 
severity. 


Selection of Observers. Ten ob- 
servers rated the experimental samples 
with respect to stuttering severity. All 
were trained in speech therapy and all 
had had clinical experience with stut- 
terers. 


Scaling Method. The stuttered 
words were scaled by the method of 
equal-appearing intervals, employing a 
nine-point scale extending from one 
for least severe stuttering to nine for 
most severe stuttering. 


Practice Procedure. The observers 
learned and practiced the experimental 
task during a practice session of ap- 
proximately one and one-half hours 
the evening before the experimental 
samples were rated. The two sets of 
nine sentences with the selected in- 
dividual moments in each set arranged 
in order from the least to the most 
severe were presented to the observers 
twice. The observers then rated the 
27 training sentences which had been 
randomized with respect to the sever- 
ity of the selected individual moments 





t 
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of stuttering. Observers then com- 
pared the severity ratings they had 
assigned to the indicated individual 
moments of stuttering with the previ- 
ously established levels of severity 
which were announced by the experi- 
menter. This procedure was repeated. 

The two three-minute practice sam- 
ples were then presented to the ob- 
servers. The observers first listened to 
the samples and then, as they listened 
a second time, rated the severity of the 
stuttering on the individual words 
previously selected for rating. 

Rating the Experimental Tapes. The 
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experimental rating session was pre- 
ceded by a brief training session. The 
observers heard once again the two 
sets of nine sentences with the selected 
moments arranged in the order of 
severity. They also rated the severity 
of the selected individual moments of 
stuttering in the remaining 27 sen- 
tences which had been randomized 
with respect to severity. Each ob- 
server then compared his ratings to 
the previously established levels of 
severity announced by the experi- 
menter after each sentence was pre- 
sented once again. 


TaBLE 1. Summaries of analyses of variance testing for differences among means of severity of 


individual moments of stuttering. 











Source df 88 ms F* pt 
First five moments 

Observers 9 6.65 74 5.29 -005 
Stutterers 19 314.04 16.53 

Oxs 171 24.17 .14 

Total 199 344.86 

First 10 moments 

Observers 9 6.01 .67 5.58 -005 
Stutterers 19 216.10 11.37 

oxs 171 20.60 12 

Total 199 242.71 

First 15 moments 

Observers 9 5.61 .62 6.89 .005 
Stutterers 19 176.23 11.37 

Ooxs 171 15.11 .09 

Total 199 196.95 

Entire 20 moments 

Observers 9 6.20 .69 8.62 901 
Stutterers 19 181.16 9.53 

Oxs 171 14.01 .08 

Total 199 201.37 

Last five moments 

Observers 9 7.79 .87 5.44 .005 
Stutterers 19 284.66 14.98 

Oxs Tl 27.26 .16 

Total 199 319.71 








*F = mso/msoxs 


tp = point in the F distribution 
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TaBLE 2. Pearson ?’s* obtained for evaluating the reliability ef single observers. 








Observer 
E F G ‘sé 2 § J 





Momenis of 
Stuttering A B C D 
First five .95 .97 -96 .97 
Entire 20 .95 .98 .96 .98 
Last five .95 .96 .97 -96 


97 93 92 .92 99 -99 
98 92 94 95 98 98 
96 93 93 94 -95 -96 








* Each set of 20 mean scale values derived from the responses of a single observer was correlated 
with a set of means based upon the corresponding sets of values for the other nine observers. 


The 20 three-minute experimental 
tapes were presented and the observers 
rated the individual moments of stut- 
tering which were indicated on their 
copies of the transcription of each 
sample. 


‘Results 


Scale Values. Twenty individual 
moments of stuttering in each of 20 
recorded three-minute samples from 
the speech of 20 stutterers were scaled 
on a nine-point equal-appearing inter- 
vals scale, with one representing least 
severe and mine representing most se- 
vere. Mean scale values were obtained 
from the judgments of each of 10 ob- 
servers on the first five individual 
moments of stuttering in each sample, 
the first 10 moments, the first 15 mo- 
ments, the entire 20 moments and the 
last five moments. 


Results of Statistical Analysis. The 
intraclass correlation technique for 
evaluating the reliability of individual 
ratings as described by Ebel (1) was 
applied to the resultant mean scale 
values. As indicated by the analyses of 
variance reported in Table 1, ob- 
servers differed significantly in gen- 
eral level of rating. For this reason 
the formula which removes the be- 
tween observers variance in estimating 


the reliability of ratings was em- 
ployed. Differences among observers 
in general level of rating thus did not 
affect the obtained intraclass correla- 
tion coefficients?: .92, .90, .91, .92 and 
.90 for the first five, the first 10, the 
first 15, the entire 20 and the last 
five moments, respectively. Individual 
mean scale values are thus about 
equally reliable for means based upon 
five, 10, 15, or 20 responses. The re- 
liability coefficient for mean scale 
values of the last five moments was 
lower than for the first five. Reliabil- 
ity of individual mean scale values 
thus did not increase with practice. 


Reliability was also evaluated for 
each of the 10 observers separately. 
Each set of mean scale values derived 
from the responses of a single observer 
was correlated with a set of means of 
the corresponding mean values derived 
from the responses of the other nine 
observers. The range of obtained Pear- 
son 7’s, which are reported in Table 
2, was from .92 to .99. The placement 
of the means in relative positions along 
the severity dimension was thus evi- 
dently quite precise for each indi- 
vidual observer. A comparison of the 


*Mean squares employed in the compu- 


tation are from the analyses reported in 
Table 1. 
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Figure 1. Distribution of 225 differences 
among observer means. 


r’s for the first five moments with the 
corresponding 7’s for the last five 
moments indicates that practice had 
no important effect upon reliability 
for any individual observer. 

Observers, as already noted, differed 
significantly in general level of rating: 
that is, they differed with respect to 
placing the means in absolute positions 
along the severity dimension. 


Many of the differences among ob- 
server means were significant.? The 
differences required for significance 
were, however, relatively small in 
terms of scale units. The largest ob- 
tained critical difference was .58. The 
distribution of the 225 obtained differ- 
ences is presented graphically in 
Figure 1, with mid-points of intervals 
for one scale unit of severity given 
along the abscissa and the per cent of 
the total number of differences along 
the ordinate. Twenty-eight per cent 


*The critical difference required for sig- 
nificance was computed as follows: cd. 
= t (2msOxS/s)**, where s = number of 
stutterers and df = 171. 
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of the differences were under .10; only 
four per cent of the differences fell 
in the highest group between .50 and 
.59. An examination of Figure 1 makes 
readily apparent the fact that most of 
the differences were quite small. 


Discussion 


The demonstrated reliability of 
mean scale values of severity derived 
from single observer judgments of in- 
dividual moments of stuttering has 
practical implications both clinically 
and experimentally. 


Experimentally, questions are often 
concerned with the effect of certain 
stimulus conditions upon the sev erity 
of stuttering. With scale values of 
severity for individual moments of 
stuttering it is possible to study the 
effect upon severity of such factors as 
semantic influences, phonetic influ- 
ences, types of stuttering behavior, 
length of words, position of words in 
the sentence or part of speech. Also 
a reliable measure may be needed to 
study the relationship between chang- 
ing stimulus conditions and categories 
of a particular classification of individ- 
ual moments of stuttering. For ex- 
ample, the trend of severity over a 
particular classification of individual 
moments of stuttering might vary 
with the readings of an adaptation 
series. A study of the interaction be- 
tween semantic influences, for in- 
stance, and the readings of an adapta- 
tion series might provide useful 
information for explaining stuttering 
behavior during the adaptation proc- 
ess. 


Clinically, the method may be 
utilized both as a diagnostic tool and 
as a measure of progress. For a given 
stutterer, knowledge of the trends of 
severity over various classifications of 
the individual moments of stuttering 
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could provide useful clues to his prob- 
lem. Variation of severity from one 
category to another might provide a 
useful guide for therapy, both in the 
direct management of behaviorial re- 
actions and in changing the stutterer’s 
attitude toward his problems. 


Progress in terms of reduction of 
severity could be measured for any of 
the categories of individual moments 
of stuttering found to have an impor- 
tant effect on severity for a particular 
stutterer. Because of the expected 
variability in severity level from situ- 
ation to situation, and from day to 
day, many measures of severity level 
would be required. The establishment 
of the reliability of single observers 
substantially reduces the total observer 
time needed for obtaining such meas- 
ures. 


The reliability of measures from the 
responses of any given observer must 
be established before such measures 
can be utilized. The reliability of a 
single observer could be established 
by a repetition of the procedure 
which has been described. The reli- 
ability of single observers could also 
be established through utilization of 
the recorded severity scales and the 
recorded experimental samples em- 
ployed in this study. The obtained 
mean scale values derived from the 
responses of the individual observer 
for the selected moments of stuttering 
in the 20 experimental samples could 
then be correlated with the previously 
established values. 


Summary 


The purpose of this study was to 
examine the reliability of mean scale 
values of severity of individual mo- 
ments of stuttering derived from the 
responses of a single observer. 


Ten observers were trained to iden- 


tify severity levels of individual mo- 
ments on a nine-point equal-appearing 
intervals scale with a previously con- 
structed tape-recorded severity scale. 
Twenty individual moments of stut- 
tering in each of 20 three-minute 
recordings of the speech of stutterers 
were judged on a one- to nine-point 
scale. Mean scale values of severity 
were obtained from the judgments of 
each observer on the first five, first 10, 
first 15, the entire 20 and the last five 
moments of stuttering in each of the 
20 recordings. 


On the basis of the results of statis- 
tical analysis of the obtained data the 
following statements may be made: 


(1) Satisfactorily reliable mean 
scale values of severity of individual 
moments of stuttering can be derived 
from the responses of a single ob- 
server. 


(2) Under the conditions of this 
experiment mean scale values of sever- 
ity of individual moments of stutter- 
ing derived from responses of an in- 
dividual observer to hve, to 10, to 15 
and to 20 moments of stuttering are 
apparently equally reliable. 


(3) Absolute values of obtained 
scale positions of severity of individu- 
al moments of stuttering are not neces- 
sarily comparable from one observer 
to another. 
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Articulation Problems Of A Group 
Of Cleft Palate Adults 


Betty Jane McWilliams 


There is general agreement that the 
cleft palate person is likely to have 
articulatory problems and that his 
therapeutic program will eventually 
have to include some attention to his 
articulation. However, steps to de- 
scribe in detail the articulatory deficits 
of cleft palate individuals have been 
taken only recently. 


Spriestersbach, Darley and Rouse 
(8) investigated the articulatory pat- 
terns of 25 children with cleft palate 
ranging in age from three years and 
seven months to eight years and three 
months. They found that their sub- 
jects had least difficulty with the con- 
sonants [m], [n], [h], [j] and [ny]. 
The five most frequently misarticu- 
lated consonants in order of decreas- 
ing difficulty were [z], [9], [s], [tf] 
and [3]. The children were generally 
inconsistent in their misarticulation of 
consonants. Errors of omission were 
observed most frequently, followed 
by substitutions and distortions in that 
order. 


The afore-mentioned study (8) 
probably constitutes the most sys- 





Betty Jane McWilliams (Ph.D., Univer- 
sity of Pittsburgh, 1953) is Assistant Pro- 
fessor of Psychology and Speech at the 
University of Pittsburgh and Director of 
the Speech Clinic at Children’s Hospital. 
This article is based in part on a doctoral 
dissertation completed under the direction 
of Professor Jack Matthews. 


Volume 1, No. 1 


a= 


tematic and detailed report found in 
the literature. However, many other 
writers have made general comments 
about the articulatory problems of 
cleft palate persons. Berry (J) pointed 
out that clinicians should be con- 
cerned with the nasal character of all 
sounds but that the greatest difficulty 
would occur on consonants. Brown 
and Oliver (2) found that defective 
sounds and peculiarities of voice 
seemed to be logically related to in- 
adequacies of the speech mechanism. 
Eckelmann and Baldridge (4) sug- 
gested that the sounds most likely to 
be distorted or replaced would be [h], 
[p]. [b], [t], (41, (s], [ts], (f], [vl]. 
[0], [w], [k], [6] and [g]. West, 
Kennedy and Carr (10) indicated that 
the plosives are the most seriously 
defective consonants and that the 
nasals [m], [n], and [n] are fre- 
quently substituted for the voiced 
plosives. They added that other sounds 
made in the throat, nose, or mouth are 
often substituted for both the voiced 
and voiceless plosives. 

A thread of agreement runs through 
these discussions. It is evident that 
most writers have felt that plosives 
and sibilants are likely to be defective. 
However, there has been little re- 
search evidence to indicate which 
sounds present the most serious prob- 
lems or the manner in which they are 
likely to be defective. Our conclusions 
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have, for the most part, been arrived 
at clinically. 


In an effort to learn more about the 
articulatory patterns of adults with 
cleft palate, the research herein re- 
ported posed the following question: 
What type of articulation errors tend 
to occur in a population of cleft palate 
adults? 


Subjects 


The subjects used in this research 
were 48 cleft palate adults between 
the ages of 17 and 59. The mean age 
of the group was 24.6 years, the 
median 22 years. The group was se- 
lected from a university clinic, a com- 
munity center, and a private practice. 
No attempt was made to control any 
variable such as classification of origi- 
nal cleft, amount or type of surgery, 
orthodontic intervention, presence or 
absence of a speech aid, or experience 
in a speech clinic. 


Procedure 


The procedure followed was re- 
ported in detail in an earlier article 
dealing with a different aspect of the 
same study (5). A part of the pro- 
cedure was the careful study of the 
consonant articulation patterns of 
these subjects. This was accomplished 
by having each of the 48 subjects 
record on tape one of 12 lists of 
words, phrases, and short sentences 
developed from materials originated 
by Dietze (3) and phonetically bal- 
anced in accordance with the inci- 
dence of consonant sounds found in 
general American conversation as re- 
ported by Travis (9). The lists were 
designed in such a way as to include 
the 23 most frequent consonant 
sounds in proportions comparable to 
their appearance in conversation. Each 
subject was tested on a total of 202 


TaBLE 1. Consonant sounds tested through the 
administration of a 202-item articulation test 
to each of 48 cleft palate subjects. 











Consonant Frequency of Percentage 
Sound Occurrence of of Total 
Sound in Test Test 
p 6 2.8 
b 6 2.9 
m 10 5.2 
t 24 12.0 
d 13 6.3 
n 21 10.4 
k 10 5.1 
g 5 2.7 
n 4 1.9 
s 18 8.9 
S 3 1.3 
t§ 1 Py i 
Zz 9 4.3 
3 0 06 
d3 1 Py 
f 5 2.4 
v 5 2.4 
6 2 9 
6 8 4.0 
] 13 6.3 
r 19 9.3 
j 3 Ez 
h 8 3.9 
w 8 4.2 
hw 0 6 








consonants. Table 1 is a summary of 
the consonants tested and the fre- 
quency with which they occurred. 


When the recording procedure was 
completed for all subjects, the experi- 
menter analyzed the consonant com- 
ponents in the recorded samples for 
the purpose of determining the num- 
ber and types of consonant articula- 
tion errors. As a reliability check, 
judgments on 10 of the test samples 
were repeated. On these 10 the ex- 
perimenter agreed with herself on 90 
per cent of the sounds involved. There 
was an observable tendency to iden- 
tify more errors when the records 
were heard for the second time. For 
this reason, all samples were evaluated 
and re-evaluated until the experi- 
menter felt that the best judgment 
possible had been made. This pro- 
cedure was carried out prior to com- 
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pilation of data on any other phase 
of the study in order to avoid biased 
judgments. 


Results 


The number of articulation errors 
made by the subjects in this study 
ranged from zero to 91 with a mean 
of 38. The number of different con- 
sonant sounds misarticulated ranged 
from zero to 20 with a mean of 10. 


An analysis of the individual sounds 
found to be in error yielded results 
differing to some extent from certain 
statements found in the literature. It is 
true that sibilants and plosives were 
often defective; but the order sug- 
gested by Eckelmann and Baldridge 
(4) and West, Kennedy and Carr 
(10) was not found in these subjects. 
The [s] and [z] sounds were by far 


Tasie 2. Frequency of errors made by 48 cleft 
palate subjects on 23 consonant sounds tested. 











Total Num- Number Per Cent 
Sound berof Times of Times inError 
Tested in Error 
8 864 547 63 
Z 432 261 61 
dz 48 23 48 
tf 48 21 44 
6 96 32 32 
§ 144 46 32 
k 480 153 82 
g 240 73 30 
re) 384 74 20 
d 624 111 18 
f 240 41 17 
Vv 240 40 17 
t 1152 188 16 
p 288 32 11 
n 192 19 10 
b 288 25 9 
j 144 11 8 
r 912 45 5 
l 624 18 3 
n 1008 33 3 
Ww 384 7 2 
h 364 8 2 
m 480 6 1 








the most frequently defective while 
[p] and [b] were among those most 
infrequently in error. Table 2 lists 
the consonants tested in the order of 
frequency of error for the total group. 
Thus [s], which was misarticulated 
64 per cent of the total number of 
times it was tested, appears first. The 
consonant [m] was misarticulated 
only one per cent of the time and 
appears last. It will be noted that the 
four sounds most often in error in 
these adults [s], [z], [d3], [tf], are 
essentially the same sounds found most 
frequently misarticulated by the chil- 
dren studied by Spriestersbach et al. 
(8). The order differs somewhat, and 
the latter study included [3], a sound 
not tested in the present study. In 
addition, the [6] appears as their 
second most frequently misarticulated 
sound. This consonant ranks fifth in 
this study along with [{] and [k]. 
When these sounds were studied to 
discover the consistency with which 
they were defective in a given subject, 
the results were in agreement with the 
findings of several research studies 
summarized by Spriestersbach and 
Curtis (7), namely, that individuals 
who misarticulate speech sounds typi- 
cally do so inconsistently. Table 3 
summarizes the data concerning con- 
sistency of articulation. These or- 
ganically impaired adults showed in- 
consistency in the errors made, as did 
the children studied by Spriestersbach 
et al. (8). On only nine of the 23 con- 
sonants tested did any of the subjects 
err every time the sound was at- 
tempted. Even this consistency is mis- 
leading because errors on [tf] and 
[ds] had to be shown as consistent 
since each was tested only once; [6] 
was tested only twice. Of the remain- 
ing six sounds where consistency of 
error appeared, [p] was consistently 
misarticulated by only one subject, 


[k] by two subjects, [g] by three. 
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TaBLE 3. Number of subjects misarticulating each consonant tested, percentage of misarticula- 
tion of each sound, and number of subjects consistently misarticulating each sound. 











No. Subjects Lowest and Highest Mdn. Per Cent No. Subjects 
Sound Misarticulating Per Cent of of Misarticulating 
Sound Misarticulation* Misarticulation Sound Consistently 
p 19 17-100 17 1 
b 16 17-66 17 0 
m 5 10-20 20 0 
t 38 448 17 0 
d 30 8-68 23 0 
n 19 5-19 5 0 
k 34 10-100 30 2 
g 25 20-100 60 3 
n 14 25-50 25 0 
8 38 6-100 94 13 
§ 22 33-100 67 8 
t§ 21 100 100 21 
Zz 39 9-100 84 12 
d3 22 100 100 22 
f 21 20-80 40 0 
v 23 20-60 20 0 
0 22 50-100 50 10 
6 30 12-88 25 0 
] 13 8-31 8 0 
r 13 5-47 11 0 
j 10 33-67 33 0 
h 7 12-25 12 0 
Ww 6 12-25 12 0 








*Based upon total number of individual consonants tested for each subject. 


[s], [z], and [f] were the sounds 
most likely to be consistently mis- 
articulated. However, even in the case 
of these sounds the tendency to be 
highly inconsistent was apparent. 
Only 13 subjects produced [s] in- 
correctly each time it was presented, 
and only 12 and eight, respec- 
tively, showed similar consistency of 
misarticulation of [z] and |f]. On 
none of the consonants on which some 
subjects showed consistent error did 
as many as half of the group misarticu- 
late the sounds each time they were 
presented. However, it should be 
noted that the five consonants most 
frequently misarticulated were the 
same sounds on which some con- 
sistency of misarticulation tended to 
appear. These were also the sounds 
which yielded the highest median per- 





centage of error. In short, the subjects 
having difficulty on these sounds mis- 
articulated them more frequently and 
more consistently than they did the 
other consonants. 

The clinically important tendency 
for these cleft palate subjects to be 
highly inconsistent in their articula- 
tion errors can perhaps be more 
clearly understood by a further ex- 
planation of Table 3, taking as an 
example the consonant [r]. It will be 
noted that only 13 of the 48 subjects 
misarticulated this sound. The table 
also shows that all 13 of these subjects 
misarticulated this sound only part of 
the time. Error occurred on from five 
to 47 per cent of the [r] sounds ap- 
pearing in the individual tests with the 
median falling at 11 per cent. In no 
instance did a subject misarticulate all 
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TaBLE 4. Number of distortions, omissions, and substitutions recorded for each sound tested, 


48 subjects combined. 

















Sound Distortions Omissions Substitutions Totals 

p 26 6 0 32 
b 25 0 0 25 
m 5 1 0 6 
t 96 92 0 188 
d 71 40 0 111 
n 24 7 2 33 
k 99 54 0 153 
g 67 4 2 73 
n 16 3 0 19 
8 492 48 7 547 
§ 41 4 i 46 
t§ 19 1 1 21 
z 246 15 0 261 
d 23 0 0 23 
f 35 5 1 41 
v 23 16 1 40 
6 22 6 4 32 
6 54 4 16 74 
1 13 5 0 18 
r 17 21 7 45 
j 9 1 1 11 
h 6 2 0 8 
w cf 0 0 

Totals 1436 335 43 1814 

Percentages 79.2 18.5 2.4 100 

[r] sounds. patterns of these adults with cleft 


We may conclude that, for the 
most part, these adults with cleft 
palate were able to produce all con- 
sonant sounds correctly some of the 
time. However, they were most con- 
sistent in their misarticulation of 
sibilant sounds. 


As a next step in the articulation 
test analysis, the manner of misarticu- 
lation of sounds was studied. This 
analysis revealed that the adults par- 
ticipating in this study were more 
likely to distort sounds than they were 
to omit or to substitute sounds. Of the 
1814 errors recorded, 1436 were dis- 
tortions, 335 were omissions, and 43 
were substitutions. These data appear 
in Table 4. We can readily see from 
reference to the table that the distor- 
tion of speech sounds is the outstand- 
ing characteristic of the articulation 


palate. This differs from the findings 
of Spriestersbach, Darley and Rouse 
(8) to the effect that the articulation 
errors of cleft palate children were 
most likely to be errors of omission 
and least likely to be errors of distor- 
tion. However, the discrepancy be- 
tween the findings of the two studies 
is compatible with Snow and Milisen’s 
(6) suggestion to the effect that the 
distorted speech sound represents a 
relatively higher level of development, 
which might be expected in adults, 
than do omissions or substitutions. We 
cannot explain similarly the relatively 
high incidence of omissions maintained 
by these adults with cleft palate. 


Implications 


We can interpret these results to 
mean that cleft palate persons are 





a 


ees 


oe 
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likely to have greater need for clinical 
attention to sibilant sounds than to 
any other sound classification. Also, as 
a group, they will show greater articu- 
lation improvement if these sounds are 
attacked first. On the other hand, the 
incidence of error in this sound group- 
ing is higher than in any other, and 
there is greater likelihood of the in- 
dividual’s being consistent in his 
sibilant errors. We may, therefore, 
assume that there will be greater diffi- 
culty in correcting these sounds than 
any of the others. If we are interested 
in starting therapy at a level where 
the patient has the best chance for 
success, we must consider the defec- 
tive non-sibilants, which, according to 
the results of this study, can be ac- 
curately articulated under some condi- 
tions by most cleft palate adults to an 
extent greater than can the sibilants. 

Another possible application of 
these data is the comparison of an 
individual’s articulation errors with 
group trends. For example, we know 
that slightly more than half of this 
group misarticulated [g] but that 
they were likely to be highly incon- 
sistent. The case falling at the median 
on error percentage produced the 
sound correctly 40 per cent of the 
time. Similar information available on 
other sounds may serve as an aid in 
evaluating severity of problems clin- 
ically. We contemplate the construc- 
tion of a scale for measuring the de- 
gree of severity of speech deficits in 
individuals as they compare with 
others having cleft palate. Such an 
instrument would help objectify clin- 
ical evaluations and would perhaps 
assist with the semantic problems en- 
countered in discussing the speech of 
people with cleft palate. 

This information suggests strongly 
that it is not enough to test each con- 
sonant in each position only once and 
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to conclude from the resulting articu- 
lation chart that we really know what 
the cleft palate person is capable of 
doing—or even that we know what he 
does do with consonant sounds. This 
is one argument in favor of utilizing 
testing materials similar to those em- 
ployed in this study or at least of 
testing all sounds not once but many 
times in many different combinations. 
We may conclude, empirically, that 
articulation testing for these cases 
ought to include attention to their 
imitative abilities. While the matter of 
response to stimulation did not con- 
stitute a part of the present study, the 
evidence of similarity between cleft 
palate and so-called functional articu- 
lation problems can hardly be over- 
looked. If inconsistent articulation pat- 
terns are likely to occur, then it would 
be well to appraise the individual’s 
ability to produce consonants in isola- 
tion, nonsense syllables, and words 
when under direct auditory and visual 
stimulation. , 

The overwhelming incidence of the 
distortion of consonant sounds sug- 
gests the need for further basic study. 
These subjects demonstrated a wide 
range in the severity of distortion. 
Some approximated a correct sound 
production very closely, while others 
were scarcely able to remain within 
the bounds of a given phoneme. In 
other words, they approached the 
substitution portion of the error scale 
suggested by Snow and Milisen (6). 
It is believed by this investigator that, 
considering the frequency of distor- 
tion of sounds in this cleft palate 
sample, a great deal could be learned 
if one could measure degree of distor- 
tion of consonant sounds. The ability 
to do this would result in a somewhat 
more refined measuring instrument 
than is now available for evaluating 
severity of cleft palate speech prob- 
lems. 
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Summary 


Forty-eight adults with cleft palate 
recorded on tape speech samples con- 
taining 23 consonant sounds in fre- 
quencies comparable to their occur- 
rence in conversation. Productions of 
the consonant sounds were evaluated 
by the experimenter for determination 
of number and type of articulation 
errors. Anaiysis of the articulation 
charts revealed the following: 


1. The [s] and [z] sounds were 
found to be most frequently and most 
consistently in error. 2. The subjects 
were highly inconsistent in their ar- 
ticulation of consonants and, in most 
cases, were able to produce all sounds 
correctly some of the time. 3. Distor- 
tion of consonant sounds was the out- 
standing characteristic of the speech 
of these subjects. 
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Louis Lerea 


Language retarded children constitute 
an extremely heterogeneous group. An 
appreciable number of these children 
are retarded in their language develop- 
ment because of impaired hearing, 
cerebral insult, endocrine dyscrasias, 
mental retardation, emotional aberra- 
tions or some combination of these 
and other disturbances. In differential 
diagnosis the term aphasoid is often 
used in referring to those children 
who fail to acquire sufficient language 
despite an allegedly adequate physical, 
intellectual and emotional endowment. 
As a group, language retarded children 
may attempt to communicate by 
means of gesture, jargon or a pro- 
foundly limited vocabulary. Many 
seem to comprehend spoken language 
while others appear to ignore oral 
expression almost entirely. 

Not infrequently the speech cor- 
rectionist is called upon to assess the 
degree and nature of the language 
retarded child’s communication prob- 
lems. Articulation tests with their 
traditional emphasis upon sound units 
cannot provide the clinician with a 
satisfactory evaluation of language. 
The focus of this preliminary study 
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was upon two facets of language: 1. 
vocabulary (lexicon) and 2. structure 
(syntax). More particularly, the in- 
tent of this research was to develop a 
standardized procedure which would 
measure the normal and the language 
retarded child’s ability to express and 
comprehend vocabulary and language 
structure. 

Bloomfield (1), Jespersen (6), Sapir 
(7), Curme (3) and other linguists 
have explained that ‘linguistic mean- 
ing’ is the sum of at least two com- 
ponents: lexicon and structure. The 
‘lexical meaning’ of a word may be 
found in any good dictionary. ‘Struc- 
tural meaning’ usually is suggested by 
the relationships between the lexical 
elements in a particular utterance. To 
illustrate variation in ‘structural mean- 
ing, consider the following state- 
ments: The man found the black boot 
and The boot black found the man. 
Although the words in these two 
sentences are identical, the structural 
changes convey a totally different 
message. 

The present research involved the 
construction of a set of clinical in- 
ventories which would yield quantita- 
tive data concerning the vocabulary 
and language structure of children be- 
tween the ages of three and nine. 
Vocabulary and structure were se- 
lected for study because these dimen- 
sions of language contribute heavily 
to communication and are readily 
amenable to standardization. In an 
effort to enhance interest and motiva- 
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tion, pictures were used in building 
the proposed inventories. 


Construction of the Picture 
Vocabulary Inventory 


A trial vocabulary series was con- 
structed from the Buckingham and 
Dolch (2) word count. At each of the 
age levels under study, 30 items were 
selected which met the following 
criteria: (1) the item could be pic- 
torially represented and (2) the word 
was judged to be within the experi- 
ential background of most children. 
The items were sketched in color on 
individual 5 x 7 inch cards. This pre- 
liminary series was administered to 
245 children representing each of 
seven age levels ranging from three 
years to nine years. A child was placed 
into that age group which was within 
six months of his nearest birthday. 
These children, 130 males and 115 
females, resided in Oakland and Wash- 
tenaw Counties of Michigan. The test 
was administered individually to each 
child in a quiet room of a school 
building. The child was requested to 
name each of the 210 pictures pre- 
sented by the examiner. Following this 
exploratory presentation, 49 key items 
were selected from the total picture 
series. These 49 items were correctly 
identified by an increasing percentage 
of children at four successive ages. In 
addition, each of these key items 
could be placed at an age level where 
approximately 50% of the children 
at that age level succeeded in naming 
the item. 

The 49 key pictures were used to 
construct a four-choice picture vocab- 
ulary test. Each key item was grouped 
with three related pictures. These sets 
of three related pictures served as 
foils or incorrect choice items in the 
test. Each of the key items was ran- 


domized among its respective foils and 
the four items were drawn in color on 
an 8 x 12 inch card. The 49 picture 
cards were then presented to a com- 
parable group of 70 children from 
ages three to nine. 

The purpose of this second pilot 
investigation of the picture vocabulary 
series was to establish the consistency 
of the word placements since the 
drawings and the test format had been 
changed. The procedure for admin- 
istering the inventory was as follows: 
The examiner presented card one to 
the child and pointing to foil one the 
examiner said, ‘See, this is a chimney; 
what is this?’ In this manner the child 
attempted to name every picture on 
the card. This procedure was followed 
with the other 48 items and the num- 
ber of correctly named key items was 
considered the child’s vocabulary ex- 
pression score. Returning to card one, 
the examiner then attempted to obtain 
an estimate of the child’s vocabulary 
comprehension. The examiner said, 
‘Let’s look at the pictures again.’ In 
this series only the child’s responses 
on the 49 key items were scored in 
computing his vocabulary comprehen- 
sion ability. The examiner directed the 
child to ‘Point to the cigarette; point 
to the pipe,’ etc. 

It was considered important to re- 


TaBLE 1. Contents of sample cards at age 
jevels three through nine years. 








Age Key item and foils 





3 chimney, cigarette, pipe, cigar 
owl, eagle, robin, bluejay 
saddle, horseshoe, hoof, hoop 
bench, stool, hammock, couch 
hydrant, faucet, fountain, tub 
collar, jacket, cuff, sleeve 


clock, compass, speedometer, ther- 
mometer 
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view all the pictures on a card because 
such a procedure would serve to in- 
hibit learning which could have oc- 
curred during the vocabulary expres- 
sion presentations. The results of this 
second preliminary investigation in- 
dicated that 35 key items, five at each 
of the seven age levels, maintained 
their age thresholds. Each item was 
assigned a point value of one. The 
contents of sample cards at each of 
the age levels studied are given in 
Table 1. 


The Construction of the Picture 
Language Structure Inventory 


A seven-year-old child who proudly 
announces to his parents, ‘Me drawed 
snow white,’ after completing a chalk 
drawing of snow falling is not neces- 
sarily retarded in either sound produc- 
tion or vocabulary. He is decidedly 
defective, however, in his grasp of the 
structure or syntax of English. 

Recently Fries (4) abandoned the 
conventional grammatical method of 
classifying sentences and their ‘parts 
of speech.’ In essence, Fries maintained 
that grammatical rules do not provide 
sufficiently consistent criteria to per- 
mit the objective study of oral expres- 
sion. He delineated the elements of 
conversation solely on the basis of the 
structure or form of the utterance. 
According to Fries, English is com- 
posed of two types of words, ‘parts of 
speech’ and ‘function words.’ The 
‘parts of speech’ were grouped into 
classes one through four and re- 
sembled the traditional nouns, verbs, 
adjectives and adverbs. To appreciate 
Fries’ contribution, however, these 
four classes should not be considered 
synonymous with established gram- 
matical categories. He identified the 
‘part of speech’ of each word by de- 
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termining whether the word could be 
substituted into standard linguistic 
frames. Words which did not con- 
form to any one of these structural 
frames were identified as ‘function 
words.’ These function words usually 
do not have precise ‘lexical’ or ‘struc- 
tural meaning.’ Examples of such 
words are a, an, the, to, by, etc. Fries’ 
model of language structure was em- 
ployed in the construction of the 
picture language structure series. 


As in the case of the vocabulary 
series, the picture language structure 
inventory was designed to yield both 
a comprehension and an expression 
score. Classes one through four and 
three of Fries’ 15 ‘function word’ 
groups were selected to be included 
in the picture language structure in- 
ventory. These major categories oc- 
curred most frequently in conversa- 
tion and could be depicted pictorially. 
Seventy-five key items with two ac- 
companying foils for each were illus- 
trated on 5 x 15 inch cards. This 
initial series was administered to seven 
children at each of the ages between 
three and nine. The objective of this 
trial presentation was to ascertain 
whether the 75 picture items and the 
instructions possessed sensitivity and 
clarity. 

In general, the procedure used to 
obtain expression and comprehension 
scores for the picture vocabulary in- 
ventory was also followed with the 
structural items. The essential dif- 
ference was that in this language 
structure series the examiner first 
described every card within a par- 
ticular class or group. This procedure 
served to provide the context of the 
responses which the child would later 
be required to give. Following this 
explanation, the examiner attempted to 
elicit an oral response to the key items 
on the cards. For example, after 
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describing each of the 10 cards or 30 
pictures in class one, the examiner 
pointed to the appropriate pictures 
and said, “This is a dog; these are 
animals; and these are—?’ The child 
was expected to respond ‘dogs.’ When 
this expression sequence was com- 
pleted, the examiner presented the set 
of cards a third time and secured a 
measure of the child’s comprehension. 
He instructed the child to ‘Show me 
the dogs,’ etc. The revised inventory 
contained 50 items, each weighted 
with an arbitrary point score of two. 
This assigned value seemed necessary 
to allow partial credit for responses 
which were not entirely correct. If 
the child said, “The clown jumps in 
the box’ (rather than into), he was 
given one point for this Group F re- 
sponse. Illustrative examples of the 
items within each category are as fol- 
lows: Class 1: feet, men; Class 2: drew, 
ate; Class 3: taller, shortest; Class 4: 
faster, slowest; Group A: his, that, 
Group B: was, will; Group F: on, be- 
tween. To be sure, these structural 
responses would require that the child 
possess a basic vocabulary skill, but, 
in addition, they require a knowledge 
of the form of and relationships 
among English lexical elements. 


Reliability of the Inventories 


The revised picture vocabulary and 
language structure inventories were 
administered to 20 children at each of 
the age levels considered in this re- 
search. These 65 males and 75 females 
were tested in the preschools and the 
elementary schools which they at- 
tended. A comprehension score and 
an expression score were obtained 
from each of the two inventories. The 
odd-even reliability coefficients com- 
puted from the performances of these 


140 subjects were corrected for length 
by means of the Spearman-Brown 
Prophecy Formula. The coefficients 
for vocabulary comprehension, vocab- 
ulary expression, structural compre- 
hension and structural expression were 
.93, .90, .94 and .95, respectively. The 
wide variability in the ages of the 
subjects and the fact that language 
ability and chronological age are re- 
lated in childhood probably con- 
tributed heavily to these reliability 
coefficients. 


Validity of the Inventories 


In the absence of external criteria, a 
common procedure for assessing the 
validity of a diagnostic tool is to com- 
pare performances of groups known 
to possess varying degrees of the trait 
which the experimental instrument 
was designed to measure. Theoretical- 
ly, any scale which purports to 
measure the language development of 
children should yield scores of in- 
creasing magnitude at successive age 
levels. Empirical validity of a lan- 
guage battery may also be inferred if 
the performances of children judged 
to be language retarded are inferior to 
those of a comparable group of normal 
speaking children. 


Subjects. The scores of the 140 
children employed in deriving the 
reliability indices were also used to 
validate the proposed inventories. In 
addition, the performances of two 
selected groups of normal children 
were compared with the performances 
of the language retarded groups. 

Two groups of 16 language retarded 
children and two comparable groups 
of children who were normal with 
regard to language development served 
as subjects in this part of the validity 
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study. The first of the deviant groups 
was classified as brain-injured with 
associated language retardation. These 
brain-injured children were  out- 
patients undergoing treatment at the 
Children’s Hospitai of Detroit. The 
diagnosis of brain injury was estab- 
lished by the medical staff. All of 
these children were taking relaxant or 
anticonvulsant drugs. Each member of 
this group was matched with a normal 
speaking child in terms of age and sex. 
These groups ranged in age from three 
years and five months to eight years, 
with a median age of six years and 
two months; their hearing and intel- 
ligence were within normal limits. The 
Columbia Mental Maturity Scale was 
employed to determine each child’s 
intelligence quotient. The mean in- 
telligence quotient and standard devia- 
tion of this group were 96.6 and 9.55, 
respectively. 

The second group of 16 language 
retarded children attended the ele- 
mentary and preschools of Oakland 
County. These children were reported 
to be language retarded by the public 


school speech correctionists. There 
was no evidence of either reduced 
hearing acuity or intellectual retarda- 
tion among the children included in 
this group. Many of these children 
had been previously diagnosed as 
aphasoid. The parents of these chil- 
dren did not report any significant 
deviations in the medical or social 
histories of the children. It should be 
pointed out that inasmuch as these 
children had not been previously ex- 
amined by a pediatrician or a psy- 
chiatrist, it is entirely possible that 
some children in this group were not 
appropriately designated aphasoid. As 
with the brain-injured group, age and 
sex were considered in matching the 
aphasoid children with a second 
normal group. The age range of these 
children was from three years and 
nine months to eight years and five 
months; the median age was six years 
and nine months. Their mean intel- 
ligence quotient as computed with the 
Columbia Mental Maturity Scale was 
98.8 and the standard deviation was 
11.85. 


TaBLe 2. Means and standard deviations for the seven age groups of normal children in vocabulary 
comprehension, vocabulary expression, language structure comprehension and language structure 


expression scores. 











Type of Age Groups 

Performance 8 5 6 7 8 9 
Vocabulary Mean 8.05 13.45 19.10 22.20 27.60 30.75 31.00 
Comprehension 8.D. 1.76 4.41 4.43 5.31 3.13 1.91 2.53 
Vocabulary Expression Mea 4.65 8.65 13.50 18.05 22.30 25.50 29.25 
8.D. 1.69 4.15 4.36 4.18 3.45 4.48 3.69 

Language Structure Mean 41.80 68.00 80.00 83.80 88.25 90.80 94.20 
Comprehension 8.D. 16.52 12.12 8.39 9.62 5.97 8.19 2.83 
Language Structure Mean 18.55 41.80 59.65 64.85 67.75 77.10 80.80 
Expression 8.D. 9.98 14.11 11.738 18.09 15.61 11.76 9.02 











80 JOURNAL OF SPEECH AND HEARING RESEARCH 





100 
95+ ee 
Jer es 
— 
65+ rs 
ta 0 
754+ : ’ you 
¢ ot 
65+ / 


MEAN SCORE 
a 
a 
T 


Sk ©-O STRUCTURAL COMPREHENSION 
©—-° STRUCTURAL EXPRESSION 








n 1 ! 1 1 4 1 
3 4 5 6 7 8 9 
CHRONOLOGICAL AGE 





Ficure 1. Graphic representation of mean 
vocabulary comprehension and expression 
scores for groups of 20 normal children at 
each of seven age levels. 


The median ages of the two normal 
groups were six years, one month and 
six years, seven months. Their intel- 
ligence quotients were 99 and 101 with 
standard deviations of 7.31 and 6.84, 
respectively. 


Procedure. All examinations were 
administered in a quiet room of the 
hospital or school. The picture vo- 
cabulary inventory was given first, 
followed by the picture language 
structure series and the Columbia 
Mental Maturity Scale. Because the 
completion of the entire battery often 
required as much as two hours, rest 
periods were provided following each 
of the examinations and whenever the 
child seemed to show fatigue. In the 
interest of accuracy and speed, one ex- 
aminer presented the item and another 
examiner recorded the child’s verbatim 
responses, 


Results. The means and standard 
deviations for vocabulary and struc- 
ture scores for each of the seven age 
groups of normal children are given 
in Table 2. The differences in magni- 
tude of the standard deviations at the 
various age levels within each type of 
language performance are of some 
importance. These differences are 
probably a reflection of a rather 
marked variability in language acquisi- 
tion between age levels. It is to be 
expected that the younger age groups 
would be relatively homogeneous, 
since they are probably not exposed 
to either the numbers or the kinds of 
language-demanding situations as the 
children in the upper age groups. 
School age children are required to 
participate in activities which place a 
high premium on language skills and 
therefore there is a greater oppor- 
tunity for variability at these age 
levels. Figure 1 is a graphic representa- 
tion of the mean scores for the 140 
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Ficure 4. Graphic representation of mean 
language structure comprehension and ex- 
pression scores for groups of 20 normal 
children at each of seven age levels. 
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TaBLe 3. Summary of t-tests of differences between means of vocabulary and structure perform- 
ances for the brain-injured (BI), aphasoid (A), and two groups of matched normal children (N) 
(VC = vocabulary comprehension, VE = vocabulary expression, SC = structural comprehension, 


SE = structural expression) . 











Performance Group Means S.E. t 
diff. 

vc N 21.56 1.45 5.43* 
BI 13.69 

VE N 16.25 1.65 4.64* 
BI 8.60 

VE/VC N 0.74 0.07 2.08 
BI 0.59 

vc N 23.31 1.48 4.20* 
A 17.31 

VE N 18.50 1.64 3.66* 
A 12.50 

VE/VC N 0.78 0.15 0.48 
A 0.71 

sc N 82.56 6.18 6.20* 
BI 44.25 

SE N 65.56 4.01 12.22* 
BI 16.56 

SE/SC N 0.79 0.08 4.94* 
BI 0.41 

sc N 86.81 6.07 3.26* 
A 67.00 

SE N 64.31 5.71 4.94* 
A 36.12 

SE/SC N 0.73 0.18 1.31 
A 0.49 








*Significant at or beyond the .01 level. 


normal children in both their vocab- 
ulary compichension and expression 
performances. In general the means 
for expression rise progressively with 
increase in age. Vocabulary compre- 
hension, however, seems to reach a 
ceiling at age eight. 

Figure 2 portrays the performances 
of this standardized sample in compre- 


hension and expression of language 
structure. The plotted mean values 
again reveal increments at successive 
age levels. The comprehension and ex- 
pression lines for language structure 
means rise rapidly between the ages 
of three and five. Thereafter these 
slopes diminish in their rate of accel- 
eration. 
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The t-test for related measures was 
computed to determine whether the 
brain-injured and aphasoid children 
differed significantly from their re- 
spective matched groups of normal 
children in vocabulary and language 
structure and to test the differences in 
mean ratios of expression to compre- 
hension scores between the language- 
retarded and normal groups. 

The results, which are given in 
Table 3, provide evidence that each of 
the two language-retarded groups, 
when compared with its matched 
normal group, was handicapped in 
ability to comprehend as well as to 
express vocabulary and _ structure. 
Neither the aphasoid nor the brain- 
injured children, however, differed sig- 
nificantly from the matched normals 
with respect to the mean ratios of 


vocabulary expression to vocabulary 
comprehension. Similarly, the mean 
ratio of structural expression to struc- 
tural comprehension for the aphasoid 
group was not significantly different 
from that of its matched normal 
group. However, the brain-injured 
group had a significantly smaller mean 
language structure ratio than its 
matched normal group. 


The differences between the mean 
scores of the aphasoid and_brain- 
injured groups (Table 4) were evalu- 
ated by using the t-test for inde- 
pendent measures. The mean struc- 
tural comprehension and _ structural 
expression scores of the two groups 
were significantly different, with the 
aphasoid children excelling the brain- 
injured children. All other differences 
were not statistically significant. 


Taste 4. Summary of (-tests of differences between means of vocabulary and structure perform- 
ances for the aphasoid (A) and brain-injured (BI) groups (VC = vocabulary comprehension, VE = 
vocabulary expression, SC = structural comprehension, SE = structural expression). 











Performance Group Means S.E. t 
dif. 

vc A 17.31 2.87 1.26 
BI 13.69 

VE A 12.50 2.46 1.59 
BI 8.60 

VE/VC A 0.71 0.12 1.00 
BI 0.59 

sc A 67.00 8.53 2.67* 
BI 44.25 

SE A 36.12 6.56 2.98f 
BI 16.56 

SE/SC A 0.49 0.08 1.00 
BI 0.41 








*Significant at the .05 level. 
{Significant at the .01 level. 
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Discussion and Conclusions 


The proposed picture language in- 
ventories seem to possess sufficient 
sensitivity to assess vocabulary and 
language structure gre pare The 
performances of the sample of normal 
children indicated that both compre- 
hension and expression appear to be 
a positive function of age. It was also 
found that, at the upper age levels, 
vocabulary expression scores approach 
vocabulary comprehension scores. It 
may be inferred from this finding that 
as a child is exposed to more formal 
classroom situations with their tradi- 
tional emphasis upon oral communica- 
tion, his expressive abilities accelerate 
at a faster rate than his comprehension 
skills. This tendency for expression to 
approximate comprehension as age in- 
creases was also somewhat evident in 
language structure scores. Also not to 
be overlooked is the simpler and per- 
haps more parsimonious explanation 
that this decreasing difference between 
expression and comprehension at the 
upper age levels is an artifact of the 
test itself. Since the upper limits of 
the test did not extend beyond age 
nine, these children may not have had 
an opportunity to demonstrate their 
maximum ability to comprehend vo- 
cabulary. In the light of these con- 
siderations caution must be exercised 
in interpreting vocabulary compre- 
hension performances at age nine. 

An aphasoid and a brain-injured 
language retarded group were in- 
cluded in this investigation to provide 
an alternate method of evaluating the 
validity of the suggested inventories. 
The picture vocabulary and language 
structure series revealed that these 
children were handicapped in both 
comprehension and expression. Thera- 
peutic procedures with these children 
should probably include work on im- 
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proving comprehension before under- 
taking to improve oral expression. 

Although the language of the apha- 
soid group was retarded, the relation- 
ship between their expression and 
comprehension performances did not 
differ importantly from that of the 
normals in either vocabulary or struc- 
ture. In addition, the aphasoid group 
was considerably superior to the 
brain-injured group in their structural 
comprehension and structural expres- 
sion performances. It is tempting to 
conclude that the term aphasoid as 
usually defined is probably inappro- 
priate for this group. If one may dis- 
regard the factor of age at which a 
cerebral insult is sustained, this posi- 
tion seems justifiable. Schuell (8) re- 
ported that adult dysphasics demon- 
strate a critical impairment in the 
auditory sphere. Although these apha- 
soid children were retarded in struc- 
tural comprehension, their disability 
was far less severe than that of the 
brain-injured group. This marked dis- 
ability in the brain-injured group’s 
comprehension may be associated with 
their retardation in expression. Also 
to be remembered is that the children 
in the aphasoid group may have been 
suffering from mild emotional dis- 
turbances. It is conceivable, therefore, 
that their language disability was a 
manifestation of infantilism or en- 
vironmental deprivation. 

The brain-injured children partici- 
pating in this study did not show sig- 
nificant differences from the normals 
in mean ratio of vocabulary expres- 
sion to vocabulary comprehension. 
This finding is supported by the work 
of Wepman et al. (10) with an adult 
dysphasic which led the researchers to 
question the validity of the term 
‘anomia.’ They reported that the adult 
dysphasic did not demonstrate a focal 
disability in the expression of noun 
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forms as distinct from other gram- 
matical ‘parts of speech.’ The picture 
vocabulary inventory required a nam- 
ing, or grammatically speaking, a noun 
response. Like the adult dysphasic, the 
language retarded brain-injured chil- 
dren were defective in every aspect 
of language and their general perform- 
ance did not reflect a specific dis- 
ability for nouns alone. 

All three structural performance 
scores for the brain-injured group dif- 
fered significantly from those of their 
matched normal group. This finding is 
in agreement with observations re- 
corded by Goldstein (5) and Strauss 
and Kephart (9). These writers have 
reported that cerebral insult is mani- 
fested by the patient’s inability to 
categorize and associate concepts. 
Conventional syntax requires an ability 
to integrate and relate series of words. 
It is to be expected, therefore, that 
these brain-injured children would 
deviate considerably in all measures of 
language structure. Inasmuch as the 
mean structural ratio of the brain- 
injured group differed from that of 
the normals, whereas the aphasoid and 
normals did not differ with regard to 
this measure, the ratio of structural 
expression to comprehension may 
serve as a valuable index in differential 
diagnosis. Also of importance in dif- 
ferential diagnosis are the structural 
comprehension and expression scores 
which differentiated the brain-injured 
from the aphasoid. 

In discussing the results obtained in 
this study, reference has been made to 
research conducted with adult dys- 
phasics. The inference should not be 
made, however, that the brain-dam- 
aged child was considered to be suf- 
fering from the same type of syn- 
drome as the dysphasic adult. The 
adult has incurred a cerebral injury 
after he has developed learning skills, 


i.e., he has already learned how to 
learn. In contrast, the brain-damaged 
child has sustained a cerebral insult 
before or during the development of 
his learning processes. The child is 
struggling to acquire the necessary 
skills for maturity without the experi- 
ential yardstick of pre-morbid 
achievement which is available to the 
adult dysphasic. The brain-injured- 
child is learning; the dysphasic is re- 
learniing. The brain-injured-child is 
adjusting; the dysphasic is readjusting. 
Generalizations which encompass both 
the brain-injured child and the adult 
must be viewed with caution. 

In conclusion, the proposed picture 
vocabulary and picture language struc- 
ture inventories seem to be useful for 
assessing the language development of 
children. The picture language struc- 
ture series may prove to be a useful 
instrument in defining language re- 
tardation. At present the inventories 
remain in an experimental stage. The 
discriminative power of each item in- 
cluded in the inventories must be 
analyzed and normative data collected 
before these inventories achieve the 
status of an effective clinical instru- 
ment. 


Summary 


The present study was prompted by 
the need for objective clinical in- 
struments which could evaluate the 
extent of language retardation among 
handicapped children between the 
ages of three and nine. Picture lan- 
guage inventories were constructed to 
evaluate both expression and compre- 
hension of vocabulary and structure. 
These inventories were administered 
to groups of normal, brain-injured, 
and aphasoid children. The computed 
reliability and validity measures sug- 
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gest that these picture language inven- 
tories possess sufficient sensitivity 
eventually to become effective sup- 
plementary tools in the diagnosis of 
language retardation. 
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Children’s Reactions To 
In Adult Speech 


Thomas G. Giolas 


Dean E. Williams 


It has been well established that non- 
fluencies occur in the speech of young 
children (3, 5). Furthermore, Johnson 
(4) has observed that in certain in- 
stances these nonfluencies are reacted 
to or labeled by adults as ‘stuttering.’ 
He has hypothesized that stuttering, 
as a Clinical problem, develops after 
the diagnosis. 

Tuthill (7), Bloodstein (J) and 
Boehmler (2) have reported investiga- 
tions in which they studied the reac- 
tions of adults to different types and 
frequencies of speech nonfluencies. To 
date no attempt appears to have been 
made to determine to what extent a 
child is aware of or reacts to non- 
fluent speech. 


It is recognized that children ex- 
hibit a tendency to classify or label 
one another, as pointed out by Mur- 
phy (6). If children learn early to 
classify and label people, events, or 
actions which they consider ‘different,’ 
then it would seem that they might 
react similarly to certain types of 
speech nonfluencies. This appears rea- 
sonable in view of the fact that they 
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Nonfluencies 


live in a society which often considers 
certain types of nonfluencies, par- 
ticularly syllable repetition, as being 
‘different’ or ‘abnormal.’ 

An important consideration, there- 
fore, is to determine whether children 
react to nonfluencies in the speech of 
others and, if so, whether their reac- 
tions are favorable or unfavorable. If 
it can be demonstrated that they re- 
act unfavorably, it might be assumed 
that they not only may react to, and 
label as ‘different,’ the nonfluencies 
which occur in the speech of others 
but that they may also react similarly 
to those which occur in their own 
speech. 

It was the purpose of this study 
to determine whether specific kinds of 
nonfluencies influence a child’s pref- 
erence for (1) a particular story or 
(2) a particular person telling a story. 
In addition, an evaluation was made 
of the extent to which the children 
referred to the speech nonfluencies 
in giving reasons for their preferences. 


Procedure 


Subjects. The subjects were 120 
kindergarten and second-grade chil- 
dren, ranging in age from five years 
and five months to eight years. They 
were divided approximately equally as 


March 1958 





ee 


~~ 











to sex. No child was included who had 
a speech defect or a history of speech 
therapy or who had had in his class- 
room a person diagnosed as a ‘stut- 
terer.’ 


Reading Passages. Three 250-word 
reading passages were composed. 
These stories were evaluated by per- 
sonnel of the Indiana University Read- 
ing Clinic as being comparable in sub- 
ject matter, suitable in content in 
terms of grade level and equal in 
comprehension difficulty. Parity of in- 
terest level was determined by reading 
the three stories to 19 second-grade 
children. Of these children, seven 
selected Story 1 as their favorite, six 
preferred Story 2 and six chose Story 
3, indicating that there was no ten- 
dency to select a particular story in 
preference to any other. 


Types of Nonfluencies. Three 
copies of each of the stories were pre- 
pared. One contained no modifications 
and was identified as the Fluent Pat- 
tern. A second copy was modified so 
that a predetermined number and kind 
of interjections were embodied in the 
passage. This was called the Inter- 
jections Pattern. The number of inter- 
jections inserted corresponded to 10 
per cent of the total number of words 
in each story. The following three 
kinds of interjections were inserted 
randomly through each passage: [a], 
[a], and [a]. A table of random num- 
bers was used to determine the dis- 
persion of interjections throughout 
each passage. The third copy, identi- 
fied as the Repetitions Pattern, was 
modified to include a predetermined 
number of repetitions of the initial 
portion of certain words. Half of the 
nonfluencies were two-syllable and 
half were three-syllable repetitions, 
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for example, ‘b- b- boy’ and ‘b- b- b- 
boy.’ Again interruptions were 10 per 
cent of the total number of words in 
each story, and they were located 
randomly as in the Interjections Pat- 
tern. 

The Fluent Pattern was used as a 
control; the Interjections Pattern 
represented a type of nonfluency 
which might or might not be evalu- 
ated as ‘stuttering’; and the Repeti- 
tions Pattern represented a type which 
is often diagnosed as ‘stuttering’ (2). 


Method of Recording. Three female 
speech clinicians were rehearsed in 
their reading of the stories until each 
speaker could produce the stories 
using each of three patterns. The Flu- 
ent Pattern was rehearsed until it 
could be read without any observable 
breaks in the rhythm of speech. Both 
the Interjections Pattern and the 
Repetitions Pattern were practiced 
until the correct number and kind of 
nonfluencies could be inserted into the 
passage without noticeable change in 
the reading rate or inflection pattern. 
Three trained speech correctionists 
judged each reading as adequate. 
Tape-recordings were then made of 
the stories with the speakers, the 
stories and the patterns rotated to 
provide for a recording of each story 
with each of the patterns by at least 
two speakers. 


The Experiment. The experiment 
consisted of two parts. Part I was 
designed to study the effect of a spe- 
cific fluency pattern upon story pref- 
erence. Thirty-six second-grade chil- 
dren were divided into three experi- 
mental groups. No two groups heard 
the stories in the same order, nor was 
the same story presented to any two 
groups with the same fluency pattern. 
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Taste 1. Distribution of frequencies of choices 
by 36 second-grade children of story content 
and results of the chi-square test. 











Stories Choices Chi-square* 
1st 2nd 8rd 
1 8 16 12 2.67 
2 14 8 14 2.00 
3 14. ad 10 75 








*A chi-square value of 5.99 (df = 2) is re- 
quired for significance at the five per cent level. 


The recordings were played in a 
quiet room to the children in groups 
of six. They were told that they 
would hear three stories and that they 
were to listen closely as they would be 
asked to name the story they liked 
best. The title of each story was an- 
nounced and the three recordings 
were then played. The following ques- 
tions were then asked each child 
privately: 

1. Which story did you like the best? 

2. Which story did you like next best? 

3. (a) Then you put this story (the re- 

maining story was named) last. Why 
did you put it last? 

(b) Why didn’t you like the last 
story? 

(c) Didn’t you like the story or didn’t 
you like the way it was told? Why? 


The children’s responses to each ques- 
tion were recorded verbatim. The 
three parts of Question 3 were struc- 
tured in such a way that they became 
progressively more specific and that 
the suggestion introduced to a degree 
in part (b) and to a greater degree in 
part (c) would not affect the answers 
to the preceding parts. 

Part II was designed to study the 
effect of a particular fluency pattern 
upon a child’s preference for a specific 
speaker. Subjects were 36 additional 


second-grade children and 30 kinder- 
garten children. The children listened 
to the same story three times, pre- 
sented each time by a different speaker 
using a different fluency pattern. The 
recordings were played in a quiet 
room to the children in groups of six. 
The order of presentation for the 
three speakers was uniformly rotated 
from group to group. In each instance 
the children were instructed that they 
would hear three ladies telling the 
same story and that they were to 
decide which lady they would like for 
a teacher. Each child was then asked 
the following questions in private: 


1. Which lady would you like to have 
for a teacher? Why did you pick that 
lady? 

2. If you could have two teachers, which 
lady would you pick second (or 
next)? Why? 

3. Now, you didn’t pick the lady—(the 
third choice was mentioned). Can you 
tell me why you didn’t pick her? 


The responses were recorded verba- 
tim. 


Results 


Part I. The chi-square test, under 
the hypothesis of chance distribution 


TaBLE 2. Distribution of frequencies of choices 
by 36 second-grade children of speech patterns 
and results of the chi-square test. 











Patterns Choices Chi-square* 
1st 2nd 8rd 

Fluent 12 8 16 2.67 

Interjections 9 15 12 1.50 

Repetitions 6 8 8 2.47 








*A chi-square value of 5.99 (df = 2) is re- 
quired for significance at the five per cent level. 
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of choices, was used to evaluate the 
effect upon the story preferences of 
the following variables: (1) the story 
content, (2) the speech pattern used 
in telling the story, and (3) the posi- 
tion (first, second or third) in which 
the stories were presented. 

Table 1 presents the distribution of 
first, second and third choices of each 
story content. Chi-square test results 
provided no evidence of differences 
among first, second and third choices 
for any story content. 

Table 2 shows the distribution of 
first, second and third choices ot eac.i 
of the patterns, identified as Fluent, 
Interjections and Repetitions Patterns. 
Chi-square results were not significant 
and provided no evidence that any one 
of the patterns affects the distribution 
of first, second and third choices of 
stories. 


TaBLE 3. Distribution of frequencies of choices 
by 36 second-grade children of positions of story 
presentation and results of the chi-square test. 











Positions Choices Chi-square* 
1st 2nd 8rd 
1 4 9 23 16.17 
2 19 10 7 6.50 
3 13 17 6 6.00 








*Chi-square values of 5.99 and 9.21 (df = 2) 
are required for significance at the five and the 
one per cent levels, respectively. 


The distribution of first, second and 
third choices for each of the three 
positions of story presentation is 
shown in Table 3. Chi-square results 
were significant in each instance. An 
examination of the distributions shows 
that first position was least favored. 
Each story, however, was heard an 
equal number of times in each posi- 
tion and the different patterns were 


used in each position. The only ap- 
parent explanation is that the story 
presented first had either been largely 
forgotten or had lost much of its 
interest. 


The question of whether the posi- 
tion variable may have obscured other 
differences, particularly with respect 
to speech fluency patterns, cannot be 
answered from these data. 

The responses, however, to the 
questions about the least preferred 
story indicated that many children 
were aware of the speech patterns and 
referred to them as the reason for 
selection or non-selection of a specific 
story. The responses were analyzed to 
determine how many made a reference 
to the story, the speaker or the man- 
ner in which the story was told when 
stating reasons for designating a given 
story as third choice. All answers 
were considered compositely, regard- 
less of which speech pattern was em- 
ployed in relating the story. 

To Question 3(a), “Why did you 
put it last?’ 12 of the 36 children 
answered, ‘I don’t know,’ and 17 made 
some reference to the story. Seven 
referred to the story teller and the 
way the story was told. Of these 
seven, three made direct reference to 
the speech pattern, stating that ‘She 
talked funny,’ “They said um, um, um’ 
and ‘One went uh, uh, uh, and she 
stuttered.’ 

In response to Question 3(b), ‘Why 
didn’t you like the last story?’ only 
two subjects gave no reasons for their 
preferences. Seventeen referred to the 
story and 17 made reference to the 
speech pattern or narrator. The latter 
made such remarks as ‘Because she 
said uh, uh, uh,’ ‘She talked funny and 
silly’ and ‘She stuttered.’ 


Question 3(c), ‘Didn’t you like the 
story, or didn’t you like the way it 
was told?’ suggested to the child a 
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choice of two reasons for his selection, 
the second of which implied a speaker 
preference. Three children were un- 
decided or gave ambiguous answers 
and 11 gave answers which referred to 
the story. Twenty-two children speci- 
fied the story teller or speech pattern 
with such references as ‘She talked an, 
an, an, like that,’ ‘She stuttered,’ ‘She 
pronounced all the words funny’ and 
‘I like the story but she talked funny.’ 

The children’s answers were also 
analyzed with reference to each of 
the fluency patterns. Five of the 16 
children selecting the story told with 
the Fluent Pattern as their third choice 
made no mention of the speech pattern 
in their replies. The remaining 11 did 
refer to the speech pattern as the 
reason for their choice in response to 
at least one of the three questions. 
Surprisingly, four of these stated that 
the speaker ‘stuttered.’ 

The story told with the Interjec- 
tions Pattern was third choice for 12 
children. Four made no reference to 
the speech pattern and eight men- 
tioned it. 

All but one of eight children who 
rated the story with the Repetitions 
Pattern third made direct reference to 
the speech pattern. Two mentioned 
‘stuttering’ specifically. 

At times it was observed that a 
child appeared unsure of the story to 
which he was referring, sometimes 
making comments about a story which 
did not correspond to the speech pat- 
tern used in telling that story. Re- 
gardless of the questionable reliability 
of their memories, the reasons given 
by the children for their selections do 
indicate that some of them reacted to 
the speech interruptions. 


Part II. The chi-square test, under 
the hypothesis of chance distribution 


TaBie 4. Distribution of frequencies of choices 
by 36 second-grade children of speakers and 
results of the chi-square test. Each of three 
speakers used a different speech pattern. 











Patterns Choices Chi-square* 
1st 2nd 3rd 

Fluent 31 4 1 45.50 

Interjections 3 24 9 19.50 

Repetitions 2 8 26 26.00 








*A chi-square value of 9.21 (df = 2) is re_ 
quired for significance at the one per cent level 


of choices, was used to evaluate the 
effect of three speech patterns (Flu- 
ent, Interjections and Repetitions Pat- 
terns) on the preferences of children 
for a speaker. As indicated previously 
the children listened to the same story 
presented by three speakers, each 
using a different pattern and with 
position of speakers (or patterns) 
counterbalanced. 

Table 4 presents the distribution of 
choices by 36 second-grade children 
for each speaker (or pattern). Sig- 
nificant results of the chi-square tests 
and the distributions of choices pro- 
vide evidence that second-grade chil- 
dren will rank the Fluent Pattern as 
first choice, the Interjections Pattern 
as second choice and the Repetitions 
Pattern as third choice. 


Tasie 5. Distribution of frequencies of choices 
by 30 kindergarten children of speakers and 
results of the chi-square test. Each of three 
speakers used a different speech pattern. 











Patterns Choices Chi-square* 
1st 2nd 8rd 

Fluent 21 5 4 18.20 

Interjections 3 12 15 7.80 

Repetitions 6 13 11 2.60 








*Chi-square values of 5.99 and 9.21 (df = 2) 
are required for significance at the five and the 
one per cent levels, respectively. 
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Distributions of choices by 30 
kindergarten children, along with re- 
sults of the chi-square tests, are given 
in Table 5. Apparently kindergarten 
children also strongly favor the Fluent 
Pattern. Differences among frequen- 
cies of first, second and third choices 
are significant for the Interjections 
Pattern, but not for the Repetitions 
Pattern. An examination of the dis- 
tributions, however, provides no evi- 
dence that kindergartners prefer one 
of these patterns to the other. 


Twenty-eight of the 36 second- 
graders made at least one direct refer- 
ence to a fluency pattern in answering 
the questions. Fourteen of the 31 
children who preferred the speaker 
using the Fluent Pattern made direct 
reference to the speech pattern. 
Typical comments were ‘She didn’t 
stutter,’ ‘She didn’t say words over 
and over again’ and ‘She didn’t say 
uh, uh, uh.’ Qualitative statements in- 
cluded comments that she ‘talked, 
‘sounded’ or ‘told’ the story ‘nice,’ 
‘best,’ ‘right’ or ‘good.’ 

Eight of the 24 second-graders who 
placed the speaker using the Inter- 
jections Pattern second referred 
directly to the speech pattern. Com- 
mon comments were of a descriptive 
nature, such as, ‘She just said uh, uh, 
uh.’ Fourteen children commented on 
the way in which the story was told, 
with responses such as, ‘She wasn’t so 
bad,’ ‘She just forgot what to say’ 
and ‘She just couldn’t think it up but 
she got it out.’ Many of the children 
seemed to classify this pattern as less 
desirable than the Fluency Pattern and 
more desirable than the Repetitions 
Pattern, with comments such as, ‘I’d 
rather have her say uh, uh, uh than 
stutter,’ ‘She wasn’t so bad,’ ‘Well, 
the third one (Repetitions Pattern) 
was real bad,’ or “The second one 
(Interjections Pattern) was almost 
like the first (Fluent Pattern).’ 


Four of the 26 second-graders who 
placed the Repetitions Pattern third 
gave ambiguous reasons or no reasons. 
The remaining 22 gave the following 
kinds of reasons: nine classified the 
pattern as ‘stuttering’; 11 described 
the pattern by referring to the repeti- 
tions; and two evaluated the narrator’s 
way of telling the story. 

The comments made by the kinder- 
garten children were less specific than 
those of the second-graders. They 
more frequently made qualitative 
statements about the speaker, the 
speaker’s voice or the way the story 
was told. Only 12 of the 30 children 
referred directly to the speech pattern 
in answering the questions. Twenty- 
one preferred the speaker employing 
the Fluent Pattern. Of these, only two 
made direct reference to the patterns, 
one saying, “The others said uh, uh, 
uh, and wa, wa’ and the other saying, 
‘The others kept saying things over 
and over.’ Fourteen made qualitative 
statements about the speaker. These 
were similar to those made by the 
second-graders. 

The story teller simulating the In- 
terjections Pattern was the second 
choice of 12 kindergarten children and 
the third choice of 15. The speaker 
employing the Repetitions Pattern was 
placed second by 13 and third by 11 
kindergartners. Only five children 
specifically mentioned the speech pat- 
tern in answering the questions. For 
the most part the others offered value 
judgments, such as, ‘I didn’t like her’ 
and ‘She wasn’t as good.’ The reasons 
also included descriptive statements 
such as, ‘She talked funny’ and ‘She 
sounded like Porky Pig.’ 

The possibility that preferences 
might have been influenced by indi- 
vidual differences among speakers 
other than the simulated speech pat- 
terns was considered. An additional 
group of 18 second-grade children 
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listened to the story with each speaker 
using a different pattern from the one 
used previously. All 18 children 
selected the speaker with the Fluent 
Pattern as first choice. It was con- 
cluded, therefore, that differences 
among speakers unrelated to the simu- 
lated patterns were unimportant in the 
choices, particularly with respect to 
the Fluency Pattern. 


Discussion 


No evidence was obtained that the 
speech patterns employed in relating 
the stories affect children’s preferences 
for stories. However, when children 
are asked to select a person as a pro- 
spective teacher, the speech pattern 
employed does appear to be a de- 
termining factor in the selections. The 
negative finding with respect to the 
effect of speech pattern upon story 
selection should thus be interpreted 
with caution. If speech patterns affect 
choice of teacher, it seems reasonable 
to assume that they also affect choice 
of story. It seems quite possible that 
the negative finding is the result of the 
order in which the stories were told, 
since position did significantly affect 
the choices. 

The second-grade children rated 
the relative desirability of the three 
speech patterns with higher agree- 
ment than did the kindergartners. 
They placed the Fluent Pattern first, 
the interjections Pattern next and the 
Repetitions Pattern last. The second- 
graders were more specific also in 
giving reasons for their preferences. 
They more often employed the label 
‘stuttering.’ They were more specific 
in stating a dislike for speech inter- 
ruptions, particularly those consisting 
of repetitions. Most of the kinder- 
gartners, however, considered the 
nonfluent patterns less desirable than 


the fluent pattern, but they were less 
consistent in choosing between the 
two nonfluent patterns. Their com- 
ments, however, indicated that many 
of them were aware of the speech in- 
terruptions and that generally they did 
not approve of them, although they 
were somewhat vague in stating their 
reasons for this disapproval. Apparent- 
ly, reactions against nonfluencies in 
the speech of adults were quite general 
at both age levels. 


This study was primarily concerned 
with children’s reactions to and eval- 
uations of the speech of adults. It is 
possible that the speech standards a 
child maintains for adults differ from 
those he maintains for himself or his 
peers. It seems likely, however, on 
the basis of the results of this study 
that children reflect, at a relatively 
early age, society’s critical attitude 
toward nonfluencies in speech, par- 
ticularly with reference to repetitions. 
Inasmuch as children appear to be 
aware of and to react to the non- 
fluencies in the speech of others, it 


. seems possible that certain children 


may, on the basis of socially learned 
value judgments, react adversely to 
similar types of nonfluencies in their 
own speech. This apparent reaction in 
young children should be given due 
consideration in any clinical or ex- 
perimental study of the conditions 
affecting a child’s early reactions to 
nonfluencies in his own speech. 


Summary 


The purpose of this study was to 
determine whether specific kinds of 
nonfluencies, repetitions and _inter- 
jections, in the speech of adults in- 
fluence children’s preferences for (1) 
a story and (2) a person telling a 
story. The subjects, or listeners, were 
kindergarten and second-grade chil- 
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dren. The experimental material con- 
sisted of three stories told by three 
adult narrators with three speech pat- 
terns identified as Fluent, Interjections 
and Repetitions, respectively. 

The results provided evidence that 
speech patterns affect children’s pref- 
erences for a person telling a story, 
but no evidence that they affect pref- 
erences for a story. 

The children’s answers to a ques- 
tionnaire indicated that they were, in 
general, aware of the nonfluencies and 
that they reacted against them. 
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